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1. INTRODUCTION 


The objective of this investigation is the determination of the degree 
of data compression which can be obtained for ERTS mul tispectra! imaaery using a 
set of algorithms which produce 2ero or minimal distortion in the reconstructed 
data. The results obtained can be used for determining the feasibility of data 
compression as an integral part of future ERTS programs, either for spacecraft 
or ground processing applications. 

The investigation of data compression techniques for ERTS mul tispectral 
data has been completed and the desired results have been obtained. The study 
has shown that the Spectral -Spatial -Delta-Interleave (SSDI) algorithms permit an 
average data compression of more than 2:1 for a strictly information preserving 
reconstruction and 3:1 or better if a small degree of distortion is permissible 
in the reconstructed data. In terms of storage requirements, a 100x100 nmi scene 
can be compressed to a single reel of tape for a saving of three tanes per scene. 

1.1 BACKGROUND OF THIS STUDY 

The mul tispectral imaging sensors of ERTS-A generate tens of billions of 
bits daily. In future missions this figure will continue to increase as higher 
resolution sensors and additional spectral bands are added. Such volumes of data 
produce severe problems in communication links, in ground data processing, and in 
ground data storage and archiving. In 1970, TRW began an investigation of low 
complexity data compression techniques tailored to the characteristics of multi- 
spectral data in order to alleviate the magnitude of such data handling problems. 
During the in-house study, the class of techniques termed the Spectral -Spatial - 
Delta-Interleave (SSDI) algorithms were developed. The SSDI technique is strictly 
information preserving to provide reconstructed data identical to the digitized 
source data entering the compressor. Such a technique preserves the data and 
cannot be criticized by any user as invalidating his data requirements. Since 
strictly information preserving techniques faithfully compress all the inout data, 
including sensor and quantization noises, the degree of compression obtained is 
limited. A modification of the SSDI algorithm was developed to permit a higher 
degree of compression at the expense of a slight controlled distortion in the 
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reconstructed data. This essentially information preserving algorithm yields data 
acceptable by many users of the data since the distortion is comparable to the 
system noise level. Preceding the current investigation, TRW developed a set of 
computer programs capable of simulating these various algorithms for use with a 
variety of multi spectral imaging data sources. These programs were validated using 
digital imagery obtained from the Laboratory for the Applications of Remote Sensing 
(LARS) C-l flight data and Apollo 9 data (S065 experiment). The programs measure 
pertinent source and compressed data statistics, generate compressed and reconstruct- 
ed data for the various algorithms, and reformat the reconstructed data for the 
subsequent generation of photographic imagery. 

1.2 STUDY TASKS 

The following 6 tasks were delineated in the Data Analysis Plan as 
necessary to meet the objectives of the investigation: 

1. Modification of the master data compression computer programs which 
were previously developed at TRW Systems to enable reformatting the 
bulk MSS digital tapes, provide for selection of the desired segments 
of the data to be processed and generation of the appropriately for- 
matted output products. 

2. Selection of 30 subscenes and 4 full scenes to be processed based on 
the desired object classes and the tapes available from NASA. 

3. Measurement of pertinent MSS data statistics for all scenes processed. 

4. Measurement of desired global and time-varying compressed data 
statistics . 

5. Generation of reconstructed imagery for selected full scenes, in- 
cluding a scene processed by an essentially information preserving 
algorithm and a scene subjected to simulated channel errors. 

6. Evaluation and interpretation of the investigation. 
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In order to obtain the body of data required to accomplish these study 
tasks, the set of output data products given in Table 1.1 was obtained for each 
scene processed. 

In addition to these proposed tasks, two additional tasks were added to 
supplement the study. First, a tape of digitized spacecraft data was obtained 
from NASA-GSFC and processed in the same manner as the bulk MSS tapes. This task 
serves to compare the compression obtained by the algorithms for both types of 
data. In addition, an investigation was performed concerning the hardware implemen- 
tation of the SSDI/Rice algorithm for spacecraft applications. 

Table 1.1. Data Measurements Obtained for Each Scene Processed 


1. MSS Data Statistics 

a. Data mean and variance per band and over all bands. 

b. Cross spectral -spatial correlation. 

c. Spectral correlation (joint probability distribution function). 

2. Data Compression Performance 

a. Probability distribution function of first difference obtained 
by the SSDI, SSDIA, and SSDIAM modes. 

b. Probability distribution function of the SHELL , SSDI, SSDIA, and 
SSDIAM symbols. 

c. Compression achieved by fixed Huffman codinq of the scene using 
the SHELL, SSDI, SSDIA, and SSDIAM modes. 

d. Entropy of these distributions. 

e. Line-by-line time-varying and overall data compression using the 
fixed Huffman, adaptive Huffman, and Rice algorithms on the 
selected compression mode. 

f. Buffering statistics of the Rice code. 

g. Huffman codes associated with the various compression modes. 
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1.3 SUMMARY OF RESULTS OBTAINED 


The investigation yielded a significant amount of data regarding source 
statistics, compression statistics, algorithm performance, and hardware complexity 
considerations. The key results are summarized below and discussed in depth in 
section 4 of this report. 

• Compressed bit rates, averaged over the scene, vary from a minimum of 
1.22 bits/sample to a maximum of 3.747 bits/sample for the strictly 
information preserving algorithms. 

• The compressed bit rates obtained, averaged over all scenes processed, 
are: 

2.99 bits/sample for SHELL/global Huffman 
2.98 bits/sample for SSDI/global Huffman 
2.92 bits/sample for SSDIA/global Huffman 
2.50 bits/sample for SSDIAM/global Huffman 
2.67 bits/sample for SSDI/adapti ve Huffman 
2.70 bits/sample for SSDI/Rice 

• For well-behaved data, the SSDIA technique gives a lower compressed bit 
rate than the SSDI algorithm. For anomalous data such as that produced 
by a defective sensor, this is not always true. 

• The essentially information preserving SSDIAM produces a significantly 
lower compressed bit rate than the strictly information preserving SSDIA 
algorithm. The effects of such distortion appear minimal when properly 
performed and areas of high detail are well preserved with no slope over- 
load or overshoot. 

• The strictly information preserving algorithms can compress four full 
100x100 nmi scenes to occupy the same number of magnetic tapes currently 
required to store one full scene. An even greater reduction is possible 
with the essentially information preserving algorithms. 
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• The effect of channel errors is minimal if the channel bit error rate 
is at least 1CT^. Channels with higher error rates can be used if 
frequent memory updates are included. 

• An implementation of the SSDI/Rice algorithm was developed to illustrate 
the feasibility of operation at rates above 100 Megabits/second with 
moderate complexity. Parallel data compressor units operating on blocks 
of data permit operation at several hundred Meqabits/second. 

• The SSDI/Rice algorithm is well suited for spacecraft data compression 
applications. The SSDI/Huffman algorithm provides an efficient data 
compression and reconstruction technique suitable for use in ground 
appl i cations . 
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2. TECHNICAL DISCUSSION OF WORK PERFORMED 


Various data compression algorithms were exercised in this study on a 
variety of multi spectral data sources. The results and conclusions of these 
studies are presented below in Sections 3 and 4. This section describes these 
algorithms and the analytic/computational tools developed in order to provide 
a framework for the discussion of the results obtained. 

2.1 DESCRIPTION OF DATA COMPRESSION ALGORITHMS USED 

2.1.1 Spectral -Spatial -Delta- Inter leave Algorithm 

The Spectral -Spatial -Delta- Interleave (SSDI) algorithm is a method of 
data compression, developed for multispectral data, which removes a maximum 
amount of redundancy subject to the constraints of minimizing complexity and 
maximizing operating speed. This compression algorithm first operates on the 
spatial redundancy in each spectral band and then uses the information obtained 
to reduce spectral redundancies between adjacent bands. 

In order to provide a conceptual description of the basic SSDI algorithm 
and several of its modifications, a situation in which there are three spectral 
bands, a, 6, y will be described. Each ground picture element (pixel) I con- 
sists of three quantized spectral components, I a> 1^, and I y . Figure 2.1-1 
may be helpful in visualizing the quantities involved. 

The algorithm proceeds in the following fashion. First, within each 
spectral band, each pixel intensity is subtracted from the intensity preceding 
it in the scan direction. (This technique is essentially DPCM, treating each 
spectral band separately.) To each pixel, then, there can be assigned a triple 
of these differences denoted (Aa,A3> Ay) . 

Next, these "deltas" are themselves differenced to obtain second differ- 
ences in adjacent spectral bands; viz d A = A3 - Act and d g = Ay - A3. Here too, 
each pixel may be assigned the triple (Aa, d A , d g ) which provides the same 
information as the triple (Aa, A3, A Y ). However, due to spectral band correla- 
tion it should be true that on the average, |d A j + |d g | < |A@ | + |Ay|, and the 

d A and d g are clustered closer to the origin than the first differences A3 and 
Ay. 
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Figure 2.1-1. Definition of First and Second Order 
Differences in SSDI 


These differences are transmitted in a manner allowing the original PCM 
sensor data to be recovered exactly from the coded sequence. Corresponding 
to each pixel, the triple (Aa, d ft , d B ) is developed. Given the preceding 
pixel intensities, denoted (ijf 1 , lj _1 , ij" 1 ) the current intensities may be 
obtained by the recursion relationships 
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+ Aa 


I j = I*’ 1 ♦ 4a ♦ d A 
l\ = Iy " 1 + Aa + d A + d B 

v 

The final step in the SSDI algorithm is the encoding of these differences. 
During this study three methods of encoding were investigated for use with the 
basic SSDI algorithms. They are 1) global Huffman coding; 2) adaptive Huffman 
coding; and 3) Rice encoding. These coding algorithms are described in 
Sections 2.2 and 2.3. 

2.1.2 SSDIA — A Block Averaging Extension to SSDI 

Most data - including the test data used in this study - contain sensor 
and sampling noise. If this noise could be reduced, a higher compression rate 
could be obtained. Two modifications of the SSDI algorithm have been developed 
to ameliorate the noise problem. The first modification is called SSDIA to 
denote that pixel averaging is employed. SSDIA is discussed in this section. 
The second modification — SSDI AM — is discussed in Section 2.1.4. The SSDIA, 
like the SSDI, is a strictly information preserving algorithm while the SSDIAM 
permits a degree of controlled distortion in the reconstructed data. 

The SSDIA algorithm is based on the observation that sensor noise is 
essentially uncorrelated from pixel to pixel. This fact degrades SSDI com- 
pression since the differential magnitudes can be large when noise on one 
pixel is positive while noise on the preceeding pixel has a negative value. 

On the other hand, the effects of the noise can be reduced by differencing the 
current pixel value in each spectral band with the average value of Q preceding 
pixel values in the same band since the (uncorrelated) noise will increase the 
value of some of these Q pixels while decreasing the value of others. If 
these Q pixels are contiguous to the present pixel, a high degree of correla- 
tion should exist with the mean of these adjacent pixels and the current pixel 
intensity. An example of the SSDIA algorithm is provided in Figure 2.1-2, 
based on the use of four previously transmitted pixel intensities. No future 
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pixel values are used in the mean evaluation because of the difficulty that 
would result in the decoding process. In general, the mean, p, of Q preced- 
ing pixels would be a weighted sum of these pixels with the weightings a 
function of inter-pixel correlations. 

Q 

i=1 

Thus, SSDIA increases the compression ratio by the averaging of pixels whose 
intensities are correlated with 1 . but whose noise components are effectively 

J 

uncorrelated with 1.. Another benefit derived from the SSDIA involves a 

J 

decrease in the magnitude of the first differences when a large step in 
intensity, such as Is produced by an edge in the scanned image, occurs between 
two adjacent pixels in the scan line. Such a situation is illustrated in 
Figure 2.1-2 between pixels B and C. Note that for the SSDIA the second 
differences, d ft and dg, are obtained in the same manner as for the SSDI 
algorithm. 
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Pixel intensities are given for two scan lines in spectral band a. The 
averaging will be performed over a set of four previously computed ad- 
jacent pixels and an equal weighting of 1/4 is qiven each intensity. The 
selected set for pixels A and C are enclosed In boxes. The compression 
improvement for these pixels by averaging is given above. 


Figure 2.1-2. Illustration of the SSDIA 
Averaging Technique 
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2.1.3 Shell Coding 

Another form of coding the triples has been investigated. This method 
is referred to as shell coding for reasons which will become clear. Instead 
of coding each differential component independently, the triple of differences 
is encoded jointly. That is, the triple of differences is mapped into a 
scalar quantity. 


As illustrated in Figure 2.1-3, each pixel corresponds to a point in a 
three-dimensional vector space with coordinates equal to the intensities of 
each of the three spectral bands. The vector space is assumed to be quantized 
into cells. If each intensity is quantized into seven bits, there are 21 cells 
in this "spectral" vector space. Therefore, each pixel could be completely 
described by numbering each cell and assigning to each pixel the number of the 
cell which contains the vector tip. This is essentially what is done in 
ordinary PCM where the quantized cell coordinates are projected onto the 
intensity axis and transmitted in sequence. Differential PCM accomplished 
this by placing a "floating" cube centered on the cell containing the last 
pixel intensities, as shown in Figure 2.1-4. The new set of pixel intensities 
is then mapped onto the coordinates of the "floating" cube as differential 
information. 



Figure 2.1-3. Cells Correspond to Two 
Picture Elements, Pi,P 2 
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Figure 2.1-4. Floating Cube Designate 
of Pi and P2- Dotted Li 
Designates Shell 2 (L=2' 
in Two Dimensions 
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A more sophisticated method of transmitting this same information is the 
transmission of only a single (digital) number assigned to the particular cell 
of the cube within which the differential intensity vector lies. However, if the 
total number of cells (in any direction away from the center cell) need to 
handle the differential vector is q in mangitude the total number of cells 
which must be labeled is 2 6q . For M spectral bands, the number of cells Q 
lying on a shell a distance L away from the origin (0,0, ...,0) is 

Q = (2L + 1 ) M - (2L - 1) M , for L > 1 

Shell coding transmits the actual cell label by conveying two pieces of 
information; the shell of the cube on which the differential vector lies and 
the label of the cell in that shell which contains the vector. This technique 
has the potential for achieving good data compression since three quantities 
are mapped into one. 

The average length of the shell code depends on the probability density 
of the number of triples lying within each shell. If most levels are con- 
centrated within the first few shells and the maximum number of shells required 
is quite large, then significant data compression is possible. (The shell 
probability distribution appears similar to an X 2 distribution.) The average 
length of the shell code depends upon the shell probabilities. Since very 
few triples (0,0,0) occur, almost all sequences have at least five bits per 
triple. In addition, the number of the shell must also be transmitted, thereby 
increasing the code length. The shell levels are Huffman coded according to 
the shell distribution. 

Neglecting the additional information required to specify the shell 
number (L), the following is a comparison of the number of bits required to 
compress a triple of differential information with shell encoding versus that 
required for the SSDI code. The first column gives the largest (in absolute 
amplitude) component of the triple, the second column gives the number of bits 
required to specify the cell in that shell containing the triple, and the last 
column gives the total length of the SSDI code required to transmit the triple. 
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Note that the third column contains a range for the total number of bits 
required. This is because the total SSDI code length is variable depending 
on what the other two levels in the triple are for a given maximum level. As 
an example (3, -3, 3) requires 18 bits total while (0, 3, 0) requires only 
8 bits in the SSDI even though they both lie on the third shell. 


Max. Level 
Per Triple 

No. Bits 
For Shell 

No. Bits 
For SSDI 

-4 

9 


-3 

8 

8 - 18 

-2 

7 

7-15 

-1 

5 

5-9 

0 

1 

3 

1 

5 

4-6 

2 

7 

6 -12 

3 

8 

8-18 

4 

9 



Again, as illustrated by the results given in this tabulation, the average 
length of the shell code depends upon the shell probabilities. Since very few 
triples (0, 0, 0) occur, almost all sequences have at least five bits per 
triple. In addition, the shell number L, must also be transmitted which 
increases the code length. 

The SSDI and Shell algorithms are two methods of source coding the 
triples (Aa, d^, d g ) and each is inherently strictly information preserving 
but can be easily extended to be essentially information preserving. The 
SSDI source encodes the data using a Huffman code based on the statistical 
occurrence of the differentials considered singly while the shell code uses 
the shell statistics to jointly encode the triple of the differentials. 

2.1.4 SSDIM and SSDIAM — Essentially Information Preserving Algorithms 

The SSDIM and SSDIAM algorithms allow the mapping of data intensities 
within specified limits in a fashion which increases the compression obtained 
while holding the distortion in the reconstructed data to controlled levels. 
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These essentially information preserving (eip) algorithms utilize a simple 
pre-processing of the data to decrease the average magnitude of the SSDI or 
SSDIA symbols and tends to average out sensor and quantization noise effects 
in the data. 

The mapping is performed on the original data sample intensities so that 
the resulting reconstructed data cannot deviate from the original data by more 
than m quantization levels, where m is a level specified by the user. Values 
of m = 1 or 2 are useful for eliminating much of the deleterious effects of 
sensor and quantization noise on the compression algorithms without producing 
noticeable visual degradation to the reconstructed image while higher values 
produce visual changes with severity depending on value m and scene content. 

Section 3 will discuss this situation in more detail. In all cases, no inten- 

sity element in the reconstructed data has an error of more than m quantization 
levels and the mean square error is always less than of the maximum 

dynamic range, where q is the number of bits per data sample. 

The SSDIM mapping is performed over a block of k pixels at a time in the 

following fashion. First, the integer-block average of all pixels is formed 
within each spectral band. Second, the individual block intensities are shifted 
in level toward that mean value, with a maximum shift of m levels up or down. 

If the intensity lies at the block mean, its value is unchanged. Following 
this operation the conventional SSDI operation is formed on the mapped inten- 
sities. Note that this averaging and mapping operation always decreases the 
sum of magnitudes S T of the SSDI symbols, implying a decrease in variance of 
the symbol probability distribution function and a corresponding increase in 
compression. 

An example of SSDIM operation serves to illustrate the averaging and 
mapping process. Assume that the original intensity values in the (3) spectral 
bands are 


a 

10 

17 

15 

Pa = 

e 

12 

10 

12 

u e = 11 

Y 

5 

10 

14 

p y = io 
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With a mapping of m = 1, the following intensities are established 


a 

11 

16 

14 

3 

11 

11 

11 

Y 

6 

10 

13 


With a mapping of m = 4, the following intensities result 

13 
11 
10 

Based on these three sets of intensities we get the following sequence 
of symbols and corresponding values of S 


a 

13 

13 

3 

11 

11 

Y 

9 

10 


SSDI = 7, 

-9, 7, -“2, 4, 2 

+ S =31 

T 

S5DIM | m _ i = 5. 

-5, 4, -2, 2, 3 

+ S - 21 

T 

SSDIM U=4 * °> 

O 

vs 

o 

o 

1— 1 
r 

o 

+ S =1 

T 


The SSDIAM is formed in a similar fashion but the mapping average is 
performed over the block of pixels in both the current scan line and those in 
the preceding scan line. Following the mapping, the conventional SSDIA opera- 
tion is performed on these mapped intensities. While the actual reconstructed 
data may differ depending upon whether the SSDI or SSDIAM operation is used, 
the distortion bound remains the same. The SSDIAM normally produces a some- 
what higher compression than the SSDIM. 

The averaging can be performed over fixed blocks or the block size can 
vary adaptively with changing scene characteristics. Fixed block sizes per- 
mit an algorithm of lower complexity but generally provides less compression 
than adaptive versions. As the block size increases to a degree where the 
block mean differs substantially from the means of subblocks, performance 
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deteriorates. An adaptive block begins at some minimal block size L and 
increases the size, one element at a time, until the mean begins to differ too 
widely from the block mean p^. For each additional intensity added, the new 
block mean can be calculated recursively from the last as 



, I L+1 
L L+l 


If |U L+K ,/,P L | falls outside prescribed bounds, the block size is trun- 
cated to include L + K - 1 pixels. The block size increases in regions of 
low data activity and decreases in regions of high data activity. In any 
case, whether SSDIM or SSDIAM, these mapping techniques avoid problems of 
overshoot and slope overload. Use of the SSDIM or SSDIAM does not entail any 
modification in the reconstruction algorithms. 

2.2 HUFFMAN SOURCE CODING 


Several algorithms exist for efficiently coding sources whose statistics 
are known. These techniques have been investigated at TRW and the Huffman code 
was chosen as being the most desirable algorithm for ground processing. The 
Huffman code has all the properties required to ensure unique decoding with 
the minimum number of bits, coding each symbol at a time. Furthermore, it 
permits use of a "table look-up" decoding algorithm which can be performed 
rapidly. 

A difficulty encountered in practical applications is the cumbersome 
algorithm required for the classical synthesis of a Huffman code given the 
statistics of the source symbols S. TRW has developed a more efficient tech- 
nique for generation of Huffman codes. The latter algorithm also permits 
grouping of low probability symbols together for simplified decoding. Follow- 
ing a discussion of the classical Huffman code synthesis, the new algorithm 
will be described. 

2.2.1 The Classical Synthesis of Huffman Codes ^ 

To explain the classical Huffman code synthesis, consider a source S 1 
with symbols sj, s^, ...» s* and symbol probabilities pj, P^, ...» P*. With- 
out loss of generality, it may be assumed that the symbols are ordered so that 

p i - p o - • • • - pL The two least probable symbols S' , and S' may be com- 
12 q r q-i q i 1 

bined and thought of as a single symbol with probability equal to P_ , + P . 

o 2 2 ^ ^ 

Thus, a new source S may be constructed with symbols Sj» Sg, . .., 

s q-l and Probabilities P^> P^» •••* p q_i» a 9 ain ordered so that P i - P 2 ~ *“ 
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“Pq„l* Ths reduction can then be repeated to obtain a sequence of sources 
Si, S^, . . . , where has only two symbols. 

A compact instantaneous binary code for the final reduction S q-1 is the 
trivial code with words 0 and 1. Working backward from this final reduction, 
the Huffman code is inductively synthesized as follows: 

Assume that a compact instantaneous code has been found for the source 
s\ One of the symbols, say sj, is formed from the two least probable symbols 
of S^' . These two symbols are and S^'J ... Each of the other symbols 
of S correspond to one of the remaining symbols of S 1 . The code for S 1-i 
is formed from the code for S 1 thus: 

To each symbol of S 1 except s|_^ and sj”^, assign the codeword used 

by the corresponding symbol in S 1 . The codewords assigned to S 1 .' 1 and s!" 1 , 

i-q i-q+1 

are formed by adding a 0 and 1, respectively, to the codeword used for 
SJ. An example of the synthesis procedure for a given source is illustrated 
in Figure 2.2-1. 


Reductions K 

Symbols Probability R, o d 

1 2 k 3 



Figure 2.2-1. Classical Huffman Code Synthesis 
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Each symbol s. of the source S 1 is thus assigned a codeword of length 1.. 
The average code length for this source is therefore 

q 



1-1 


where L satisfies the inequality 


Q 

0 < L g H = - ^ p} log p} 
i = l 

where H is the entropy of the source S 1 . 

The difficulty imposed by the classical Huffman synthesis arises in the 
forward flow of the code generation between successive reduced sources. This 
procedure is very inefficient with respect to both storage and time when used 
as the basis of a computer algorithm for coding a source. 

2.2.2 An Improved Huffman Algorithm for Computers 

The new algorithm separates the source reductions from the code synthesis. 
The first part of the algorithm keeps track of the number of times each sym- 
bol in the original source is grouped during the sequence of source reductions. 
This contains all information about the length of the codeword assigned to 
that symbol in the r|sulting Huffman code. The second part of the algorithm 
uses these lengths, , to generate a Huffman code C for the source S. 

Note that the resulting Huffman code may or may not be identical to the 
code generated by the classical synthesis procedure. Nonetheless, the- average 
code length is identical. Using the classical technique, many different Huffman 
codes can also be generated, depending on the assignment of 0 and 1 in each 
reduced source. 

The method begins with the same source reductions as in the classical 
Huffman code synthesis described above with the addition of a final reduction 
to source S 1 ^ containing only one word. The code lengths may be determined as 
follows: 


2-12 



If the symbols s! and s] , are combined to form s 1+1 the symbol s 1+1 may 
J J-a ,• a • J a J 

be considered a reduction of each of the symbols s. and s' . Assign to each 
1 J J-i 

symbol s.. the length 1. , initialized to zero. Each time a symbol, or any of 

its subsequent reductions, is reduced the value of 1.. is incremented by one. 

It is perhaps easier to understand this algorithm by referring to the example 

in Figure 2.2-2. 



£ 1 0 O 0 0 0 1 

*2 0 0 0 0 1 2 

n 3 0 0 0 1 2,3 

0 0 1 2 3 4 

Jt 5 0 1 2 3 4 5 

H 6 0 1 2 3 4 5 


Figure 2.2-2. Determination of Code Word Lengths, i 


The second part of the algorithm is illustrated in Figure 2.2-3. This 
part of the algorithm operates as follows: 

1. The lengths of JL are ranked in the order of increasing length. 

2. Symbol of minimum length, is assigned zeros. 

3. Each successive symbol S m has a code formed as 

C = (C_ ,+1) + (i .) zeros, 
m m-1 m m-1 
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SYMBOL 


LENGTH, l. 


OPERATION 


CODEWORD 


t 


S 6 5 

Figure 2.2-3. 


0 + 1 and 1 shift 10 

10 + 1 and 1 shift 110 

110 + 1 and 1 shift 1110 

1110 + 1 and 1 shift 11110 

11110 + 1 and no shift 11111 


Huffman Code Synthesis Using 
Code Word Lengths i i 


This algorithm is very fast and essentially separates the problem of code 
generation from that of source reduction. The only information which need be 
stored from the source reduction portion of the algorithm is the vector of 
code lengths. 

2.2.3 Low Probability Symbol Grouping 

Often the total number of symbols i in source S 1 is quite large and many 
of these symbols have probabilities of a small fraction of one percent. To 
save time in the encoding/decoding process at the expense of a small increase 
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in average code length, these low probability symbols can be lumped into a 
single symbol. As an example, after ordering symbols with decreasing proba- 
bility of occurrence, the first J symbols are directly coded, where 

J 

X/i 5 •» 

i=l 

The remaining symbols, having a total probability Pj + ^ of one percent or less, 
are grouped into symbol S J+r If M symbols are lumped into S J+1 , R bits must 
be used to describe these M symbols, where R = Oogg M}. During transmission, 
codeword C J+1 is followed by R bits to describe which of the M symbols occurred. 
The average code length is lengthened by such a grouping by less than P.R. 

The advantage of grouping symbols which seldom occur is that the maximum 
length of any code word can be held to some predetermined length N. This sim- 
plifies the decoding algorithm and keeps the length of the required look-up 
table to length 2 N . These advantages in decoding are obtained at the possible 
expense of a slightly increased average code length. 


* 

{ } itieans next larger integer. 
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During the encoding process * whenever one of the grouped symbols occurs 
the compressor transmits the sequence of bits forming code word Cj + -j followed by 
R bits to describe which grouped symbol occurred. When the decoder encounters 
code word C J+1 , it uses the next R bits to decode this grouped symbol. 

2.2.4 Computer Program 

A computer program has been developed and tested which accepts an array 
of symbols and generates the Huffman code. The program allows the operator 
to group symbols if desired and generates the grouped Huffman code and the 
average bit rate if R bits are used to separate the lumped symbols. 

The flowchart describing the program is given in Figure 2.2-4. The inputs 
required are the source symbols S., their associated probabilities P. , and the 
maximum codeword length acceptable N. The program outputs the Huffman coded 
Table HUF, which contains the coded bit stream C associated with the source S. 

Two major subroutines are used in this program. Subroutine ORDER re- 
orders' the symbols and their probabilities in. a decreasing order so that the 
most probable symbols are at the top of an array 0. Subroutine GROUP adds the 
two least probable symbols in the array 0 to form a source reduction. This 
subroutine also keeps count of the number of source reductions performed and 
keeps track of the original source symbols which have been combined to form 
each reduced symbol. Each symbol is given a bit position in an array V. If 
symbols s-| , s 3 and s^ have been combined in a source reduction, that reduced 
symbol is represented in V as the binary word {....10101). This re- 
presentation allows a compact designation of groupings at each stage in the 
reduction. 

In operation, the program takes the array of input symbols and their 
probabilities, calls ORDER to rank them, and combines the M least probable 
symbols to form the grouped symbol Sj_ M+1 of probability Pj M+1 = £ P 

i=M 

{assuming P-j >_ l. ••• >. Pj_] >. Pj)* This new set of J-M+l symbols forms the 

input to the basic algorithm in which successive calls to subroutines GROUP 
and ORDER generate successive source reductions until only two reduced symbols 
remain. At each stage of the reduction, array LENGTH is updated by one for 
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CALL 

ORDER (0) 
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each symbol in S which has been combined to form one of the reduced symbols 
which have been grouped in that step. 

Following the reduction process, the array LENGTH is used to compute the 
binary codeword associated with all of the J-M+l non-grouped source symbols. 
LENGTH is re-ordered so that the most probable symbols which have the shortest 
code lengths are at the top of the array. A test takes place after LENGTH is 
re-ordered. If the longest codeword exceeds N bits, more source symbols are 
grouped and the source reductions performed again until the maximum codeword 
length is N or less. With 256 source symbols, such an occurrence is guaranteed 
at some stage of grouping. 

The generation of the codes then begins with the minimum length codeword 
and proceeds from word to word with the successive steps of adding 1 to the 
previous codeword and adding the required number of zeros to fill the word. 

Table HUF is then generated where all entries correspondi na to non-grouped 
symbols contain the computed Huffman codeword. For all grouped symbols, the 
entry in HUF contains the lumped prefix codeword Cj_ M+ -| followed by 8 bits 
giving the symbol directly. 

2.2.5 Adaptive Huffman Coding 

The adaptive Huffman algorithm used in the TRW simulation program 
produces a new Huffman code for each scan line of data based on the sta- 
tistics of the difference symbols generated for the preceding scan line. 

Two concurrent operations are therefore performed for the processing of a 
given scan line, the symbols for this line are encoded using the Huffman 
code developed based on the statistics of the previous line, and the proba- 
bility distribution of symbols generated for the current line is computed. 

At the end of the scan line the Huffman coding subroutine is called and the 
new code is computed and stored. The same technique is used for reconstruc- 
tion since the decoding processor has regenerated the previous line of 
symbols and can develop the same Huffman code. 

This technique is very rapid and does not require significant storage 
for performing the required operations. The efficiency of the code generated 
depends upon the correlation of symbol statistics from one scan line to the 
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next. Generally, the match of statistics is quite good and certainly 
superior to a technique using the symbol statistics from one block of data 
on a scan line to develop a code for use in the succeeding block of data on 
that line. 

This form of adaptive coding yields excellent results in subscene 
processing since the scan lines are relatively short and the subscene 
normally contains a single object class with rather uniform symbol statistics. 

A more efficient coding for large scenes might require breaking each scan line 
into several segments and computing a separate Huffman code on each segment 
for use in encoding symbols on the corresponding segment of the following 
scan line. 

2.3 THE RICE CODING ALGORITHM 

The Rice encoding algorithm, developed by R. F. Rice at JPL^ is a variable 
length coding system which is basically strictly information preserving. Operat- 
ing on a sequence of source symbols, the Rice machine adapts by selecting one of 
three coding schemes with computational capability for optinally switching to that 
one of three codes which is compatible with the data activity. Code FS performs 
well with low data activity, code FS performs well for data of medium activity, 
and code CFS performs best with very active data. In order to adapt to rapid 
changes in activity, the basic Rice compressor monitors data activity and selects 
the appropriate code mode based on small blocks of data symbols. 

The resulting coding system produces output rates within .3 bits/sample 
of the one-dimensional entropy of the samples and cannot expand the data by 
more than .1 bits/sample under any circumstance. While the Rice machine can 
operate on any source of data, the input to the Rice encoder, as described here, 
is the sequence of SSDI , SSDIA, or SSDIAM symbols. 

Rice assumes that the adjacent samples of A are statistically independent 
and that the probability distributions of these symbols decrease monotonically 
on either side of A = 0. For his assumed zero-memory source, Rice seeks to 
assign the shortest code words to source symbols which have the greatest pro- 
bability of occurrence and the longest code words to those symbols which have 
the least probability of occurrence. This is the same idea underlying the Huffman 
code as described in Section 2.2. 
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is 


For each block of 0 symbols, the entropy of first order linear differences 
given by 


q 

HiP) = zL P-i T°9 P,- bits/pixel 

i=l 


x L 

where p. represents the probability of the i— source symbol and the log 

1 N+l 

function is base 2. Parameter q can vary from zero to 2 -1 , where N is the 

number of bits used for source quantization. Entropy can be considered as the 

quantitative measure of the source data activity. 

L(P) is the number of bits/pixel required to code the sequence of diff- 

pN+l i 

erence samples have the distribution P = {p.} " . Under Rice's assumption 

1 i=l 

of a zero memory source, the average code length cannot be less than H (P ) : 


E[L(P)] > H(P) 

where E denotes the expectation operator. 

The generality of Rice's model assumes that P can change completely from 
block to block and his coding algorithm can change from block to block, depending 
on the distribution, P, within each block. He measures the system performance 
by comparing E [L (P) ] with the lower bound H{P). In operation, the Rice algorithm 
monitors the data activity of each block of symbols and select one of three codes, 
dependent on the activity range. 

2.3.1 The Fundamental Sequence and Code Assignments 

Let Zj represent the j— difference sample in a block of J transformed 
symbols. This input sequence appears as a sequence of symbols where each 
input sample Zj is associated with some symbol S. by the following assignments. 


+1 «■ > Sg 
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+ 2 


■4 — 


S 


4 


+127 


— S 


254 


-127 


-> S 


255 


f2l 

Following Rice L J , a q x J sample matrix can be constructed with elements 
a,,- j given by 

1=1,2 q 


- < 


' iff Z j ■ S i : j = 1, 2, .... J 


0 otherwise 


This is illustrated in Figure 2.3-1 for q = 4, J = 8 and an assumed input 
sequence S-i^-^S^S-jS^S-] . The fundamental sequence FS is generating by a 
‘'wiggle" operation involving three steps: 

1. Cross out all zeros which lie below a 1 in the sample matrix. 

2. Cross out any remaining l‘s in the last row. 

3. Letting r. denote the residual 0's and 1 1 s remaining in the i tfl row, 
concatenate the {r.j} to form the FS. This operation is illustrated 
in Figure 2.3-2 for the example given in Figure 2.3-1 to produce the 
fundamental sequence. 

FS = r^grj = 1100100111001 0. 
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Fiqure 2 3-2 Fundamental Sequence Generation 
FS = 11001001110010 
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While not necessarily the best code for the FS, the code table used by 
Rice is given in Table 2.3rl. Code word designation corresponds to the 
i th word of a variable length binary code. As shown in the table, an 8-word 
variable length code is used. Coding the FS (or its complement) means assigning 
one of the code words in Table 2.3-1 to each sub-sequence of three binary digits 
making up the FS. Each sub-sequence of three bits is an address to an 8-word 
table containing the code. Rice's code word assignment is given in Table 2.3-2. 


Table 2.3-1. Variable Length Code 


Code 

Word 

Designation 

Actual Code Word 

4 1 

0 

*2 

1 0 0 

‘3 

1 0 1 

‘4 

1 1 0 

l 5 

1110 0 

‘6 

1110 1 

‘7 

11110 

‘e 

11111 


Table 2.3-2. Code Word Assignment 


Address 

Code Word 
Assignment 

0 0 0 

*1 

0 0 1 

*2 

0 1 0 

‘3 

1 0 0 

*4 

0 1 1 

*5 

1 1 0 

l 6 

1 0 1 

h 

1 1 1 

l 8 
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As arv example, use the FS previously determined, 
at the end of the FS so that the FS is divisible by 3. 
here 


FS: 

(110) 

(010) 

(Oil) 

(100) 

(100) 

CFS: 

(11101) 

(101) 

(11100) (110) 

(110) 

FT: 

(001) 

(101) 

(100) 

(Oil) 

(010) 

CFS’: 

(100) 

(11110) 

l 1 C\ 1 \ 

Uvi; 

(11100) 

(101) 


In this example, FS is of length 15 bits, CFS is of length 19 bits, and CFS is 
of length 19 bits. 

Letting F be the length of the block FS normalized to bits/sample, the 
decision as to mode used is given by 

CFS if 1 < F < 1.5 
FS if 1.5 < F < 3 
CFS if F ^ 3 

Figure 2,3-3 illustrates the basis of these range selections and shows how 
close each coding mode lies to the entropy curve. For the example sequence 
of eight symbols, F - 15/8 implying that FS should be transmitted. 

2.3.2 Split-Pixel Modes 

An additional set of operating modes can be added to the basic compressor 
to extend efficient performance to data sources with entropy outside the principal 
operating range: i.e., very active data. 

The basic compressor operates independently of the level of quantization and 
can be made to treat m-bit data as n-bit data (m >_ n) simply by shifting out the 
m-n least significant bits of each data sample. The split-pixel option allows 
the basic compressor to compress only the n most significant bits of each m-bit 
sample and to transmit back directly the most significant of the remaining k=m-n 
bits. Each such split-pixel mode is designated by the notation (n,k). Note that 
if n+k < m, distortion can occur when the data is reconstructed. Thus, the Rice 
algorithm can be extended to become essentially information preserving throuqh 
such an operation of the split-pixel modes. 


A dummy zero is added 
CFS and CF!> are given 
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0 12 3 4 

H(P), BITS, PIXEL 


Figure 2.3-3. Dynamic Performance Curves 
for 0 < H < 4 


Split-pixel performance versus source entropy is given in Figure 2.3-4. 

As shown, one of the (n,k) modes lies close to the entropy line for 4 <_ H < 8. 

By computing the entropy of the block before encoding it, the decision as to which, 
if any, split-pixel mode to use can be made. Alternatively, since adjacent scan 
lines are normally highly correlated it is very likely that the particular (n,k) 
mode best for one line would also be best for the next. This technique, used in 
the TRW simulation, computes the entropy of each scan line to determine the best 
(n,k) mode and uses that mode for encoding the next line. In general, this tech- 
nique comes close to optimality and is definitely easier to implement than the 
first technique mentioned. 
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4 5 6 7 8 


H R/ bits/pixel 

Figure 2.3-4. Split-Pixel Performance 

As shown in Figure 2.3-4, not much compression is lost if the modes used 
is slightly wrong. As an example, if the block entropy is 6 bits/sample and 
the (6,2) mode is used instead of the optimal (5,3) mode, the data rate only 
increases by .15 bits/sample. Thus, not all possible modes need be provided 
in the compressor, simplifying both encoding and decoding operations. In the 
TRW simulation only three split-pixel modes are used. The entropy H of a segment 
of data is computed and the mode selected is determined as follows: 


ENTROPY 

MODE ( 

H < 4 

(7,0) 

4 < H _< 5 

(6,1) 

5 < H £ 6 

(4,3) 

H > 6 

(3,4) 
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The (7,0) mode is not a split-pixel mode but the range of the basic compressor. 
Since the n most significant bits can be transmitted as FS, CFS, or CF3, there 
are a total of twelve possible Rice modes allowed in the simulation. 

2.3.3 Data Formatting 

Since each block of data can be transmitted as FS, CFS, or CFS two block 
identification bits (BID) must be inserted before each block of compressed data. 
These two bits enable the decoder to know the mode used. Since each line will be 
encoded in one of four split-pixel modes, two line identification bits (LID) are 
inserted at the beginning of each new scan line. Following the two LID bits are 
the four intensity values which comprise the spectral components of the first 
element in the scan line. For split-pixel operation, following the two BID bits 
are the variable length Rice encoded n most significant data bits of the block. 
Following the Rice sequence are the k least significant bits of the block. The 
data format for the beginning of a scan line and the first (n,k) block is given 
below for a block of length L. 

Several comments on the operation of program CRICE lend further 
insight into the techniques employed. First, the fundamental sequence 
(FS) is generated by the following steps which simulate the Rice wiggle 
operation: 

(a) Have vector of 64 symbols stored in IRS. 

(b) Set parameter IS to zero initially. On successive passes, IS=1, 

-1, 2, -2, etc. 

(c) Generate the successive bits of fundamental sequence in IFS with 
a one bit for each entry of IRS where symbol level equals IS and 
a zero bit for each entry with a different symbol level. 

(d) Keep track of location of symbols in IRS which produced a one bit 
in IFS. After each pass through IRS, reorder IRS by eliminating 
these symbol levels. 

(e) Continue steps c and d, continually incrementing parameter IS 
until no symbols remain in IRS. At this point the fundamental 
sequence for the block is contained in vector IFS. If mode CFS 
is to be used, each entry in IFS is complemented. 
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Overhead bits are transmitted at the beginning of each new scan line 
with the following format: 


Scan Line 
Identification 
2 bits 



Initial Intensity Values 


Mode 



Mode of 

SSDI/SSDIA 

Band 1 

Band 2 Band 3 Band 4 

Split-Pixel 

1 bit 

7 bits 

7 bits 7 bits 7 bits 

2 bits 


New Scan Line Format 


Depending on whether or not a split-pixel mode is being used for a 
block, two formats exist for the compressed data in a block. 


Rice Mode Compressed Block Bits 

2 bits ? bits 


Non Split-Pixel Block Format 

Rice Mode Compressed (n) block bits (k) Residual Bits 
2 bits ? bits 64 k bits 

Split-pixel (n,k) Block Format 

The initial two bits contained at the beginning of each block or line 
format is specified as follows: 

00 -* FS 

01 - CFS 

10 -* CFS 

11 -v new scan line 
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2.3.4 The DCSTAT2 Simulation of the Rice Algorithm 


The Rice simulation in DCSTAT2 given in section 2.5.2 computes the number 
of bits required for a Rice encoding of the transformed data. In order, to 
minimize the time required for the simulation, several shortcuts are used which 
introduce no error in the computation but are faster than the classical synthesis 
of the Rice code. 

The entropy H of the input symbols is computed for each scan line and used to 
determine the split-pixel mode to use for the next scan line. The block size used 
is sixteen ground elements which implies 64 samples per block from all four spectra 
bands. For each block the FS is computed not by the Rice "wiggle" operation pre- 
viously described but by recognition of the fact that the length of the FS (LFS) 
can be obtained by the algorithm: 

0 if S. > 0 

1 if S. < 0 


r 

LFS = 2 2 x 

i = l L 


Thus, for symbols 0, -2, 2, -1, 5 we obtain LFS = 1+5+4+3+10=23. The normalized 
length of FS is given by LFSN = LFS/64 and LFSN is used to compute the Rice mode 
to use for the most significant n bits of the data samples in that block. If 
LFSN £ 1.5 use CFS, if 1.5 £ LFSN < 3.0 use FS, and if LFSN £ 3.0 use CFS. 

If either CFS or CFS is used, the table given in Table 2.3-1 is used to 
determine the length of the coded sequence assigned to each block. Two more bits 
(BID) are added to each block total and a running total (NBRICE) of all bits 
required for encoding the samples in the scan line is maintained. At the end of 
each scan line this total is added to the thirty bits/scan line required for 
transmitting the LID and the first element values. Dividing this total by the 
number of samples in the scan line gives the average number of bits/sample requirec 
to Rice encode that line. In addition, DCSTAT2 computes the percentage of the 
time that the various Rice modes were used. 
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2.4 COMPUTER PROGRAMS 


The computer programs used to simulate the basic compressions algor- 
ithms and to obtain the various statistical characterizations of the input data 
and the compressed data are discussed in this statistical measure section. The 
programs are flexible to permit the user selection of any or all of a number of 
different options. All programs have been written in FORTRAN IV or COMPASS and 
run on a CDC 6500 computer. Furthermore, to provide flexibility, the overall 
program was divided into several subprograms which can be used separately or in 
sequence. An overview of the data flow structure is provided by Figure 2.4-1. 
The interrelation and flow of the various functional programs is presented in 
Figures 2.4-2 through 2.4-5, including input and output data tapes. 



Figure 2.4-1. Overall Data Handling Flow Diagram 
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Figure 2.4-2. Data Flow Between DCSTATl 
and DSCTAT2 



Figure 2.4-3. Huffman Compression and Reconstruction 
Flow Diagram 
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The program DCSTAT1 begins by extracting the data from the MSS tape and 
reformatting it. It then uses the appropriately formatted MSS data to compute 
scene statistics and compressed data statistics for the various compression algor- 
ithms selected. DCSTAT1 generates a tape containing the sequence of transformed 
differences obtained by the selected version of the SSDI , and generates the Huffmai 
code to be used on the scene. 

DCSTAT2 uses the tape generated by DCSTAT1 and computes the time varying 
compression and buffer statistics resulting from the global Huffman, adaptive 
Huffman, and Rice encoding of the scene. DCSTAT2 also generates other character- 
izations of these data compression techniques for the selected scene. A two pass 
organization was required because the global Huffman code can be developed only 
after the probability density function has been measured for the entire scene. 

For Huffman encoded data, program RSTHUF simulates the selected data 
compression technique by generating a compressed bit stream and reconstructing the 
data. This program operates on *e tapes of difference symbols generated by DCSTAT1 
Program BLDTAB generates the look-up table used by RSTHUF, the table being based 
on the Huffman code generated by DCSTAT1. Program PKHUF generates the Huffman 
coded data tape. The packed tape of a 7 track, 800 BPI, Fortran binary tape 
written in external format. The data is a continuous bit stream broken up into 
288 60-bit word (2160 bytes) records. 

Regardless of the coding algorithm used, the output of the reconstruction 
are four data tapes each containing one spectral band. These contain the re- 
constructed data for a scene comprising an area twenty-five nmi square. These four 
tapes are then packed to obtain TAPE 4 which contains the reconstructed data with 
spectral bands packed as shown in Figure 2.4-5. 

The final step is the construction of the photographic copy of the 
imagery. TAPE 4 containing the images is processed on a General Dynamics L70 
laser printer/scanner. The L70 is an eight-bit laser film writer with 256 
intensity (grey) levels. The film writer constructs photographic negatives 
from which positive prints are then made at TRW. Note that each negative con- 
tains all four spectral bands, formatted as shown in Figure 2.4-5, to prevent 
negative and print processing variations which could otherwise occur. This 
technique also provides all spectral components of a scene on the same print 
to facilitate comparisons. 
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2.4.1 Program DCSTAT1 


Program DCSTAT1 uses the MSS data tapes to generate the various statistical 
measurements of the input and SSDI-transformed data as well as tapes of the SSDI 
symbols. DCSTAT1 gives the operator flexibility in selecting options for each run. 
The output tape can be based on the SSDI, SSDIA, or the SSDIAM algorithms. 

The various statistical outputs which can be selected are: 

• Data mean and variance in each spectral band and over all 
bands. 

• First difference pdf in each spectral band for the SSDI, SSDIA, 
and SSDIAM transforms. 

• Joint spectral -spatial correlation along the scan lines. 

. Overall pdf for SSDI, SSDIA, SSDIAM, and Shell symbols. 

, Huffman code for SSDI, SSDIA, SSDIAM, and Shell symbols. 

• Scene entropy and average code length for SSDI, SSDIA, SSDIAM, 
and Shell transforms. 

The flow of DOST ATI is given in Figure 2.4-6. The program is initial- 
ized and the various input parameter options are entered through namelist INPUT. 
These parameters (with default values indicated in parentheses) are: 

• KSTOP - No. of scan lines in scene (100) 

, JSTOP - No. of pixels in each scan line (100) 

. BLUR - Blur option = "ON 1 * or "0FF" (0FF) 

« JUMP - Correlation step sizes (1, 2, 4, 6, 8) 

• BANDS - Band pairs used for probability ellipse (1, 2) 

• CTAPE - Tape output = "0N" or "0FF" (0N) 

• IMODE - Tape output symbols = "SSDI", "SSDIA," or "SSDIAM," (SSDI) 

• IPROB - pdf computation = "0N" or "0FF" (0N) 

• ICOR - correlation option = "0N" or "0FF" (0N) 

, I EL PS - ellipse option = "0N" or "0FF" (0N) 

• ICOMP - compression option = "0N" or "0FF" (0N) 

• NSKIP - No. of initial records to skip 

• ISLS - Starting scan line 
« ISLE - Ending scan line 

• IPS - Starting pixel location 
« IPE - Ending pixel location 
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Figure 2.4-6. Flow of Program DCSTAT1 
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Figure 2.4-6. Flow of Program DCSTAT1 (Continued) 
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These last five parameters are used by subroutine EXTMSS which extracts 
and unpacks the MSS data from the input tape and reformats the data to a form 
acceptable by DCSTAT1. The flow of subroutine EXTMSS is given in Figure 2.4-7. 



Figure 2.4-7. Flow of EXTMSS 


The first three scan lines are read from the reformatted data for each 
spectral band. Three lines are initially read in each band to provide the infor- 
mation required if averaging and blur are to be performed. If BLUR is ON the inpi 
samples are blurred in each spectral band to simulate a decreased sensor resol utic 
using the equation: 


l(i,j) = .68 + .28 [i(i,j-l) +i(i,j+i) + i(i+i,j) + Ki-l.j)] 


+ .04[i(i-l,j-l) + l(i-l,j+l) + i(i+l,j-l) + i(i+l,j+i)J 
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Each intensity is used to update vectors S and S2 which will eventually contain 
the mean and second moment of the scene. 

If CTAPE is ON, the initial intensities at the beginning of the scan line 
in each spectral band are written onto the output tape. If ICOR is ON the spectral - 
spatial correlation is performed by taking the normalized inner product of the 
vectors corresponding to pixels separated on a scan line by the distances given by 
the parameter JUMP. The correlation value for each pair of pixels is entered into 
array C0R. These values will later be averaged over the scene and printed as a 
correlation table. 

The first differences are generated if ICOMP is ON. The first 
(j-j f ferences are obtained in three ways depending on whether SSDI, SSDIA, or SSDIAM 
is being simulated. For SSDI, the first differences are obtained by substracting 
successive pairs of intensities in each band, the difference being stored in array 
DIF1. For SSDIA, the first differences are obtained by subtracting each intensity 
from the average, A, of surrounding intensities, the differences being stored in 
array DIF2. For the SSDIAM, each intensity is mapped appropriately and then the 
first differences are obtained in the same fashion as used for the SSDIA, the result 
being stored in array DIF3. DIF1, DIF2, and DIF3 increment the probability vectors 
PIS, PISA, and P1SAM. Simultaneously, array ELIPS is updated once per set of four 
spectral intensities for later use in generating the joint probability ellipsoid. 

The second differences are then computed using the appropriate first 
order differences as computed by the SSDI, SSDIA, and SSDIAM. Three second 
differences and the appropriate first difference for spectral band one are used 
to increment the overall probability vectors P2S, P2SA, and P2SAM. These vectors 
are used to determine the Huffman codes for the scene. The differences symbols 
forming either the SSDI, SSDIA, or the SSDIAM are written on the output tape if 
CTAPE is ON. IMODE determines the compression mode written on the output tape. 

The largest symbol difference in magnitude for each set of four 
spectral intensities is then used to determine which shell that set would 
occupy if Shell coding were used. Probability array PSH is then updated, 
corresponding to that shell. PSH is used to determine the ODtimum Huffman 
code for Shell coding 

The above procedure is performed on a pixel by pixel basis. When the 
last pixel in a scan line is encountered the next scan line of data is read from 
the innut tapes, stored, and the above computations are performed. After the 
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last scan line has been read and all operations performed, computer output oper- 
ations begin. First, parameter S is divided by the number of intensities used to 
generate the scene mean y. Parameter S 7 is divided by the same number to obtain 

z 2 

the second moment. The scene variance a is obtained by subtracting y from 
the second moment. These operations are performed for each spectral band and for 
all bands averaged together. The means, variances, and first difference prob- 
abilities are then printed. 

The correlation is printed as a function of the step size for values of 
correlation between .71 and 1.00. The joint probability ellipse is printed for 
each pair of bands desired. In the printout the joint occurrence (0,0} is 
normalized to the value 100 and all other joint probabilities are normalized by 
the same factor to present an output product that can be easily interpreted. The 
SSDI , SSDIA, SSDIAM, and SHELL symbol probabilities are displayed in the ranoe 
[- 18 , 18 ] . 

The remainder of the program generates the entropy and Huffman codes 
for the SSDI, SSDIA, SSDIAM, and SHELL encoding modes. All four calls to sub- 
routine Huffman are alike, the only difference being the probability vector used 
to compute the appropriate Huffman code. Given the probability of occurrence of 
each symbol over the scene, Huffman returns the sequence of coded bits associated 
with each symbol and the average code length. Subroutine Huffman is fully described 
in section 2.4.5. Scene entropy is computed for each coding made from the same 
probability vector by the equation: 

H p(i) l°g 2 P(i> 
i 

Depending on the IM0DE specified by the operator, either the SSDI, SSDIA, or 
SSDIAM Huffman code table is written onto TAPE16 for use by DCSTAT2 or BLDTAB. 

After specifying the various input parameters to DCSTATl operation is 
automatic and no further operator interaction is required so that the program can 
be submitted either through a terminal or by batch processing, If all options 
are ON program DCSTATl requires about 1.2 milliseconds per intensity sample. 
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2.4.2 Program DCSTAT2 

Program DCSTAT2 accepts the tapes containing the SSDI»- SSDIA, or 
SSDIAM symbols and the Huffman code computed by DCSTAT1 for that scene 
and computes the time-varying statistics for the global Huffman., and Rice 
encoding, The flow of DCSTAT2 is given in Figure 2.4-8. 

Input parameters are accepted via namelist INDC2. These parameters 
are IBUF1, IBUF2, MODE, and PMAX. IBUF1 and IBUF2 denote the buffer out- 
put rate in bits/sample, with default values 3.0 and 3.5. Parameter MODE 
determines the outputs desired and has three values: 

MODE = 1 -*• Buffer statistics and global Huffman statistics 

2 -> Rice and adaptive Huffman statistics 

3 ■* All of the above (default value) 

Parameter PMAX permits varying the total probability of these symbols in- 
cluded in the lumped grouping for the generation of the adaptive Huffman 
code. 

Initially, DCSTAT2 reads input tape TAPE16 which contains the code 
table, of length 256, associating each input symbol read from TAPE! 5 to the 
global Huffman code as generated in DCSTAT1 . 

The first scan line of symbols is read from TAPE15 by COMPASS program 
LININ. If MODE is set to either 2 or 3, HPAH is called to compute the 
probability distribution of the transform symbols for that scan line. At 
the end of the scan line this distribution is used by subroutine HUFMAN to 
develop the Huffman code which will be used to encode the symbols from the 
following scan line. This technique is used for all scan lines in the 
adaptive Huffman mode except for the first line, since no a priori information 
is available there. The first line is encoded a posteriori by the Huffman 
code developed for that line. Thus, the first two scan lines of data have 
the same code. In addition, the entropy of the scan line symbols is computed 
by HPAH for use in Rice encoding. The normal call to LININ, reading the next 
scan line, follows B. This read is skipped for the first scan line since it 
has already been read to initiate the adaptive Huffman mode. If an end-of- 
file is encountered by LININ, the program goes to H to print the overall 
compression achieved on the data. 
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Figure 2.4-8. Flow of Program DCSTAT2 
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Figure 2.4-8. Flow of Program DCSTAT2 (Continued) 


If no end-of-file is encountered a check is made for M0DE=2. If MODE 
is not 2 the number of bits required for encoding that line by the global 
Huffman code is determined using the stored table. Each transform symbol 
is used to call the code table and the number of bits required to code that 
symbol is returned. After all symbols from that scan line have been converted 
to bits, the total number of bits obtained is added to the number of overhead 
bits required and this sum is divided by the total number of symbols encoded 
to obtain the average number of bits required for that line of data. For the 
global Huffman the overhead per scan line is equal to the number of bits 
required to denote the beginning of a new scan line and the number of bits 
used to send the values of the first intensities in the scan line. The 
first pixel intensities in the line are transmitted directly and not by 
transform symbols in order to prevent propagation of possible errors from 
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one scan line to the next line. The format used in DCSTAT2 for denoting the 
beginning of a scan line is the same as that given in the appendix on ground 
processing of images. The prefix code is transmitted followed by a string 
of eight zero bits to initiate the scan line and the next 28 bits give the 
four 7-bit intensity values of the first elements in each spectral band. 

If M0DE=1 , the flow is transferred to G where the compression is 
printed for global Huffman encoding as the average number of bits per pixel 
required for that line. The average number of bits per pixel out of the 
buffer (IBUF) is subtracted from the average no. of bits per pixel required 
for compression to yield the average change in the buffer for that line. 

This amount is added to the buffer contents left from the previous line 
to yield the total buffer fill. If IBUF is greater than the average number 
of bits put into the buffer* underflow occurs and a 0 is printed for the 
buffer statistics of that line. In general* IBUF would initially be set 
at about the average bits/pixel required for the scene and variations 
of buffer fullness would be observed. After a run of DCSTAT2, buffer statistics 
may reveal runs of underflow for segments of the scene where data activity 
is much less than the scene average. Also, the more serious problem of over- 
flow may result due to areas of the scene where data activity is much higher 
than the scene average. In such cases DCSTAT2 can be rerun with an appro 
priate change in IBUF value. 

If MODE is not equal to 1, the average bit rate for that line is computed 
for adaptive Huffman coding. The Huffman code developed and stored in table 
look-up form for the previous scan line is used to convert input symbols 
into bits in a fashion analogous to that used for global Huffman encoding. 

The overhead required for adaptive Huffman coding is also identical to that 
required for global Huffman coding, namely the number of bits required for 
denoting the start of a new scan line and the 28 bits giving the first four 
intensity values. 

The first operation performed for Rice encoding is the determination 
of which split-pixel mode to use on the line, if any. Several checks of the 
previously computed line entropy H are made. If H _< 4, no split-pixel mode 
is used (n=7, k=0). If 4 < H <_5 the (n=6, k=l) mode is used. If 5 < H < 6, 
the (ri«4, k=3) mode is used. If H > 6, the (3, 4) mode is used. As given 
in section 2.1.6 the n most significant bits are Rice encoded and the k least 
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significant bits are transmitted directly to provide strictly information 
preserving data compression. If distortion is desired, k can be set in 
the program to a value less than 7-rv. 

The Rice coding is performed on blocks of data in the scan line. This 
block length is preset to 16 elements along the line but can readily be 
changed to another value if desired. The total number of bits in the block 
fundamental sequence (LFS) is first computed by determining the number of 
bits required for each symbol in the block. The algorithm used is described 
in section 2.3. The normalized fundamental sequence length (LFSN) is ob- 
tained by dividing LFS by the total number of symbols in the block. Length 
LFSN is tested to determine whether the fundamental sequence (FS) should be 
transmitted directly, whether the fundamental sequence should be coded (CFS), 
or whether the fundamental sequence should first be complemented and then 
coded (CFS). If LFS is between 1.5 and 3 bits/symbol FS is transmitted, 
if LFS > 3 bits/symbol CFS is transmitted, and if LFS < 1.5 bits/symbol 
CFS is transmitted. Subroutine TOTALB assigns to FS or FS the code described 
in Appendix C and computes the total number of bits required for coding that 
block. If FS is transmitted directly, the number of bits required for that 
block is LFS. Two overhead bits are added per block to denote the Rice mode 
used for coding that block of data. 

After the last block of data in the scan line has been encoded, additional 
overhead is added to the total number of bits already computed. If a split- 
pixel mode was used for that line, overhead is added for transmitting the k 
least significant bits. This overhead amounts to k. times the number of 
symbols in the line of data. At the beginning of each scan line overhead is 
used to give the decoding algorithm line initialization information including 
the 7-bit intensities of the first element in each spectral band. 

At the end of each line the compression statistics are printed both 
on a line-by-line basis and globally. Depending on the value assigned to 
MODE, either the buffer statistics and the average bits/sample for the global 
Huffman coding, the average bits/sample for adaptive Huffman and Rice 
coding, or all four are printed. After the last scan line of data, the 
percentage occurrence of the various Rice modes are printed (if MODE is not 
set to 1). 
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2.4.3 Program BLDTAB 

Program BLDTAB generates the look-up decoding table used by program 
RSTRCT to reconstruct the compressed data. BLDTAB initially reads TAPE16, 
the tape generated by DCSTAT1 which contains the Huffman code developed 
for the given scene. Other inputs entered for each run are N, the number 
of code words, LC0DE, the lumped (grouped) prefix code word, and LUMPL, the 
number of bits in the lumped code word array. IHC contains the Huffman code 
words from TAPE16 and array IC gives the corresponding number of bits in each 
code word. 

The program generates a table ITAB of length 2 12 corresponding to all 
possible sequences of twelve bits. Each entry in the table contains twelve 
bits of information, the most significant eight bits giving the first 
difference symbol which is Huffman decodable in the twelve bit address of 
that entry and the least significant four bits give the number of bits 
contained in that decodable word. Both pieces of data are required by program 
RSTRCT. The structure of BLDTAB is given in Figure 2.4-9. 

If a non-grouped symbol a . is present which produces a code word of 
length j bits, the value (a., j) is entered at all locations in the table 
which have the binary representations with the most significant j bits equal to 
the code word C.. . The number of entries of (A_.,j) is equal to . For entries 
where the most significant bits are the lumped code word C L (the length of C L is 
normally constrained to be at most four bits), the following eight bits give the 
symbol level which occurred or, if the eight bits are all zeros, denote the start 
of a new scan line. The entries in the table corresponding to addresses beginning 
with codeword C L use the following eight bits in the address as the actual level 
of the grouped symbol which lies in the range [-128,128]. Parameter IT shifts the 
lumped code word to head the 12 bit string. ISFT equals the total number of bits 
in C L and the eight bits following C L . IS is the number of bits left in the twelve 
bit string in the event that C L occurs (IS =12-ISFT). 

The loop ending at B sets up all lumped entries in the table required by 
the given codeword. If IS is zero, only 256 entries need be set since ISFT = 12. 

If IS is greater than zero, each of the 256 lumped values must be put into 2 IS 
locations. ID, initially zero, is shifted IS bits and IW represents the appropriate 
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Figure 2.4-9. 


Flow of Program BLDTAB 
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table entry of level plus bits to be stored in ITAB at address J. As mentioned, 
if IS is not zero, 2 IS -1 other locations must be filled with the same information. 
After ID is incremented by one, the loop is continued. 

Following generation of the lumped symbol entries, the non-lumped symbols 
are stored. In each loop, ending at C ICD represents the Huffman code word and 
ISFT represents the number of bits in that word. IW represents the table entry 
(A.,j) to be stored in 2 N2 locations of ITAB. These entries are set by loop C. 

12 

After all of the non-lumped symbols have been set, the entire 2 entries 
of ITAB have been generated. Figure 2.4-10 gives a segment of table ITAB. The 
top section of Figure 2.4-10 shows a portion of the lumped symbol table entries and 
the remainder gives entries corresponding to symbol levels -2 and 2. Table ITAB is 
written onto TAPE8. 

2.4.4 Program PKHUF 

Program PKHUF, written in COMPASS, generates the comoressed data tape 
using the Huffman code from TAPE16 and the symbol tape, TAPE15. PKHUF reads the 
successive transform symbols from TAPE15 and looks up the corresponding code word 
from the stored table. These code words are then packed onto the output tape, 
TAPE7, using subroutine PACODE. The code bits are packed in a manner permitting 
a code word to overlap computer words or tape records. PKHUF is shown in Figure 
;2 .4-11. 

At the beginning of a scan line, the lumped code bits C L are transmitted 
followed by eight zeros. This is followed by a single bit denoting the encoding 
mode used for the scan line. Provision is made for either SSDI or SSDIA encoding 
and this first bit informs the reconstruction program RSTRCT as to the mode used. 
The following 28 bits represent the actual seven bit intensities of the first scan 
line element in each of the four spectral bands. This information serves to 
initiate the algorithm used for reconstruction and prevents the propagation of 
possible errors from one scan line to the next. 

Subroutine INLINE reads in each new scan line of data. For this new 
scan line, PACODE packs code word C L and the one bit of mode data onto the output 
tape TAPE7. The following four words read from TAPE15 contain the first intensity 
from each spectral band. These 28 bits are packed onto the output tape. For the 
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Figure 2.4-10. A Segment of Table ITAB 



















rest of the scan line, the transform symbols A. from TAPE15 are used to obtain 
the associated code words C.. which are then packed onto TAPE15. There are NP symbols 
per scan line, each stored as a level between -128 and +128. To access code word 

table IHC, 128 is added to each symbol level read from TAPE15. 

The program proceeds from one scan line to the next, packing the bits 
as shown into one long bit stream until the last scan line is encountered at which 

time the program stops and an end of file is generated. 

2.4.5 Program RSTHUF 

Program RSTHUF reconstructs the data using the compressed data on 
TAPE 7 and the look-up decoding table ITAB from TAPE8. To initialize RSTHUF, 
table ITAB is read from TAPE8 and input parameters NLINE and LUMPL are 
entered. NLINE is the number of lines to be reconstructed and LUMPL is the 
number of bits in the lumped prefix code word. Masks MSK1 , MSK2, and MSK3 
are set to mask off the seven bit intensity values from the compressed data 
tape (MSK3) and to separate the eight bit symbol level (MSK1 ) and the four 
bit shift value (MSK2) from, each twelve bit entry in table ITAB (see Figure 
2.4-12). 

Parameters IL, the scan line designator, is initialized to zero and 
1SFT, the number of shifts of the input data, is set to twelve and the first 
coded record is reed via subroutine INTAP. Locp ICO reconstructs each set of 
four intensities corresponding to a ground picture element. Loop 100 obtains 
the set of four transform differences to be decoded by use of subroutine GET12. 

GET12 returns each twelve bit sequence from the input record using the previous 
shift value 1SFT and stores this word in IWD. IWD is used as address to table 
ITAB, returning the entry ITAB (IWD). This entry is masked by MSK2 to return 
ISFT, the number of shift required for the next decoding operation, and ID(I) 
the first symbol in IWD. The value ID(I) is then shifted left four places to 
obtain the true value of the symbol. 

ID ( I ) is compared to the value 256 in order to check for the occurrence 
of a new scan line. The start of a new scan line is denoted in the compressed 
data by the lumped prefix code word followed by eight zeros. The corresponding 
ITAB entry gives the symbol value 256 and ISFT =12. 
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Figure 2.4-12. Flow of Program RSTHUF 









Figure 2.4-12. Flow of Program RSTHUF (Continued) 


At the start of a scan line flow shifts to point 500. If not the first 
scan line, the previous line of reconstructed data {NEWL) is written onto the 
output tape. If not the last scan line, subroutine GET! 2 shifts the input 
data to get the next twelve bits of code. The first bit gives the mode (SSDI 
or SSDIA) used for that scan line. That bit, masked by MSK3, is checked to 
set IMODE. If I = 0, IOMCE = 1 for SSDIA, otherwise IMODE = 2 for SSDI. The 
following 32 bits are the true intensity values of the first element in each of 
the four spectral band. Loop 600 reconstructs these intensities in NEWL (1,1). 
After regenerating these four intensities, line counter IL is incremented by one 
and pixel counter IP is set to 2. 

Normal SSDI or SSDIA reconstructed!' s performed if ID(I) < 255. IMODE 
determines the algorithm used on that line of data. If IMODE = 2, SSDI 
reconstruction is performed by loop 300 using the set of four symbol values 
obtained in loop 100 and the previously reconstructed pixels in the line. 

These reconstructed intensities are stored in NEWL (I, IP), where I denotes 
the spectral band and IP denotes the position of the element along the scan 
line. 
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If IMODE = 1, SSDIA reconstruction is performed by loop 400 for each 
set of four symbols obtained by loop 100. First, as described on page 
an average of the four appropriate reconstructed intensities is computed in 
each spectral band. These averages (A) are used together with the difference 
symbol values to reconstruct the intensities in NEWL. After the last line 
has been reconstructed and written on the output tape, IL = NLINE and RSTRCT 
terminates. 

The flow of subroutine GET12 is given in Figure 2.4-13. Subroutine 
GET12 extracts each twelve bit sequence from the input tape, storing the 
sequence in IWD. Parameter ISFT is entered through the call and gives the 
number of shifts required to correctly position the new bits in IWD so that 
a new code begins in the first bit position of IWD. GET12 handles the various 
bookkeeping and bit picking tasks required when code words overlap successive 
computer words or tape records. 

2.4.6 Program CRICE 

Program CRICE generates the Rice-compressed data tape RICEP. CRICE 
reads TAPE15, generated by DCSTATI, which contains the sequence of difference 
symbols (either SSDI or SSDIA), Rice-encodes these symbols, and outputs the 
data into a continuous packed bit format onto tape RICEP. The flow of CRICE 
is given in Figure 2.4-14 and subroutine SPLIT, called for the split-pixel 
mode, is detailed in Figure 2.4-15. Parameter KSTOP specifies the number of 
scan lines to be compressed. No input specifying the number of pixels per scan 
line is required since information designating the start of a new scan line is 
contained in TAPE15. 

For each scan line, the program begins by reading in a line of data 
symbols and computing the probability distribution function of the symbols 
for subsequent use in generating the line entropy. The decision as to the 
split-pixel mode to be used on a line of data is based on the symbol entropy 
of the previous line. For this reason, the first scan line of data is encoded 
with the split-pixel mode OFF. With subsequent lines for which a split- 
pixel mode is required, subroutine SPLIT is called. For a given (n,k) mode 
each integer symbol a,. is divided by 2 to obtain the new symbol values to 
be Rice encoded. The difference between this scaled symbol and a., IRR, is 
generated for subsequent direct transmission. This operation is described 
in section 2.3.2. 
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Figure 2.4-15. Flow of Subroutine SPLIT 


Subroutine NEWL outputs the two bits of overhead data required at the 
start of each new scan line to designate split-pixel mode used and the initial 
pixel intensity values. Loop 250 computes the fundamental sequence for each 
block of 16 pixels {64 symbols) in the scan line by simulating the wiggle 
operation given in 2.3.1. The number of bits in the sequence NFS is normalized 
by the number of samples to obtain LFSN. The fundamental sequence, augmented 
if necessary, is sub-divided into groups of three bits. 

Based on the value of LFSN, the fundamental sequence is transmitted 
directly (BLKID = 0), coded directly (BLKID = 1), or complemented and coded 
(BLK(D =2). At the beginning of each data block PACODE generates the block 
overhead bits required to specify the Rice mode used. After the fundamental 
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sequence has been encoded as required these bits are then packed by PACODE. 
If the split-pixel mode is operable for that block, flow proceeds to F where 
the residuals IRR are sequentially packed by PACODE. 


The above operations are performed block-by-block until the end of the 
scan line is encountered. At this point the entropy H is computed for that 
scan line to determine which split-pixel mode, if any, is optimal for use on 
the following scan line (see 2.3.2). If not the last scan line, the program 
flow returns through loop 800 to process the following scan line. After the 
last scan line has been processed, subroutine OTAPE empties the output tape 
buffer and program CRICE terminates. 


2.4.7 Program RSTRIC 

RSTRIC reconstructs the Rice encoded data packed on RICEP by program 
CRICE. The flow of RSTRIC is given in Figure 2.4-16. The operation of 
this program is of greater complexity than that of RSTHUF used for recon- 
structing Huffman encoded data since in addition to determining the compres- 
sion mode of the data (SSDI or SSDIA) the reconstruction mode must be 
determined for each block of data. As illustrated for a block of 16 pixels 
(64 intensity samples), each block can be encoded by transmitting the 
fundamental sequence (FS), the coded fundamental sequence (CFS), or by 
complementing and coding the fundamental sequence (CFS). In addition, each 
line can be transmitted in the split-pixel mode. These various modes are 
communicated to the reconstruction algorithm in the form of line and block 
overhead bits given by the format included in section 2.4.6. 


Input parameter NLINE specifies the number of lines to be reconstructed 
within loop 800. Subroutine INTAP reads the first record from the input tape 
of encoded data and GETS, similar in function to GET 12 of program RSTHUF, 
obtains the first five bits in IWD to determine the new line identification 
and the mode (SSDI or SSDIA) to be used for reconstructing the data. The 
following 28 bits are used to establish the initial intensities in each 
spectral band. Following this, the next two bits give the split-pixel mode 
used for compressing that line of data. After this initialization of line 
operations, block reconstruction is performed until the next sequence of bits 
occur denoting the start of a new scan line. 
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Figure 2.4-16. Flow Diagram of Program RSTRIC 

















Figure 2.4-16. Flow Diagram of Program RSTRIC (Continued) 


2-60 











The first two bits of block information denote which of the three 
Rice modes are used for that block. These bits are denoted by ID; if ID = 0, 
the FS is used, if ID = 1, the coded FS is used, and if ID = 2, the complemented 
and coded FS is used. If ID = 1 or ID = 2, flow goes to point C to regenerate 
the fundamental sequence. Five compressed bits are fetched at a time in IWD 
to address array IFSC which contains the decoding table. IC is the index to 
the code word sent and NONE counts the number of one bits decoded in the 
fundamental sequence. 

Vector IFS contains the regenerated fundamental sequence, complemented 
if ID = 2, and ISFT contains the number of bits in the current codeword. To 
decode the next codeword the compressed bit sequence is shifted by ISFT bits 
and the following five bits are extracted. This procedure continues until 
there are 64 one bits in the decoded fundamental sequence, implying that 
all codewords in the block have been decoded. If ID = 0, the fundamental 
sequence is regenerated directly in IFS by extracting the compressed bit string 
until IFS contains 64 one bits. 

Following reconstruction of the fundamental sequence in IFS, flow 
proceeds to D and loop 250 which initializes the 64 compressed symbols (IRS) 
to be reconstructed and the 64 residuals ( IRR) used for the split-pixel 
mode. If a split-pixel mode (n, k) is used, loop 255 extracts the following 
64k bits to regenerate these residuals in vector IRR. 

Loop 300 regenerates the vector IFS of symbols generated by the SSDI or 
SSDIA algorithms. The technique used to generate the fundamental sequence 
involved wiggling through the symbol levels in the order, 0, 1, -1, 2, -2, 3, 
etc. and the inverse of this operation is performed to obtain these symbols 
in IRS. Parameter IS, initialized to zero, keeps track of the symbol value 
being reconstructed in each pass. Initially, the first 64 bits of the 
fundamental sequence are tested and the presence of a one bit at location 
I sets entry IRS(I) = 0. This first pass reconstructs all symbols having a 
zero value. On successive passes, the bits IFS are tested and the presence 
of a one bit in location I sets IRS(I) = IS, the symbol level currently being 
generated. Since the entries in vector IRS were initialized to 1000, the 
presence of any other value in an entry of IRS signals that this symbol has 
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already been reconstructed and this symbol is skipped on succeeding passes. 

Each occurrence of a one bit in the fundamental sequence increments parameter 
NONE. When NONE = 64, all symbols have been reconstructed for that block 
and flow exits loop 300 to point G which contains the inverse SSDI or SSDIA 
algorithm, depending on the mode used for compressing the data. Loop 510 
reconstructs each set of four intensities based on the symbols IRS and 
residuals IRR which have been regenerated from the compressed data. Loop 
520 reconstructs the sixteen pixels contained in the current block. If the 
SSDIA mode is used, LEWL contains the preceding reconstructed pixel intensities 
in that scan line. If the SSDIA mode is used, LEWL contains the average of 
the four appropriate reconstructed intensities, as obtained by subroutine AVG. 

After reconstruction of a block, GET5 extracts the next five bits and 
tests the following two bits to determine whether the following compressed 
data represents another block in the same scan line or the start of a new 
scan line. If a new scan line, the current reconstructed scan line of data 
is output on a tape and flow proceeds within loop 800 until all NLINE scan 
lines have been reconstructed. 

Provision is made through DATA statements for varying block sizes, 
split-pixel modes, and selection of code words for the fundamental sequence 
in the event that the operator decides to change these parameters of the Rice 
algorithm. 


LID 

UNCODED 
FIRST ELEMENT 

BID 

RICE CODED n BITS 

k BITS 

2 Bits 

28 Bits 

2 Bits 

Variable 

Lk bits 
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3. RESULTS OF THE INVESTIGATION 


This section discusses the significance of the various measurements 
performed on the ERTS tapes, the selection of scenes processed, and 
summarizes the results obtained for both normal ERTS data and for imagery 
containing anomalous data. Conclusions and tradeoffs based on these 
results are discussed further in section 4. 

3.1 SUMMARY OF PROCESSING PERFORMED 

During the data analysis phase of the study, thirty subscenes and 
four full scenes were processed and the following output quantities were 
obtained for each scene: 

1 . MSS Data Statistics 

a. Data mean and variance per spectral bands and average 
overall bands. 

b. Cross spectral -spatial correlation 

c. Spectral correlation (joint probability density function) 

2. Data Compression Performance 

a. Probability distribution function of first differences 
obtained by the SSDI, SSDIA, and SSDIAM modes. 

b. Probability distribution function of the Shell, SSDI, 

SSDIA, and SSDIAM symbols. 

c. Compression achieved by fixed Huffman coding of the scene using 
the Shell, SSDI, SSDIA, and SSDIAM modes. 

d. First-order entropy of these distributions. 

e. Line-by-line and overall time-varying data compression using 
the fixed Huffman, adaptive Huffman, and Rice algorithms on 
the selected compression mode. 

f. Buffering statistics of the Huffman or Rice Code. 

g. Huffman codes associated with each of the compression modes. 

All four full scenes were compressed and reconstructed using the 
SSDIA/Huffman or SSDI/Rice algorithm and photographs were made from the re- 
constructed data. In addition to the above outputs, the following data 


have been obtained for the principal full scene, ERTS 1015-17440: 

a. Photographs of the reconstructed image obtained by use of the 
essentially information preserving SSDIAM algorithm with 
mappings of +_ 1 level and + 3 levels. 

b. Photographs of reconstructed images with the compressed data 
corrupted by simulated channel errors (10~ 5 and 10” £ bit 
error probabilities). 

c. Probability distribution of the original intensity levels 
of the MSS data. 

3.2 SCENES AND OBJECT CLASSES PROCESSED 

During the investigation thirty 5x5 nautical mile square subscenes 
and four 25 x 25 nautical mile square full scenes were processed. The 
selection of scenes to be processed were based on several criteria including 
availability from NASA-ERTS User Services. A minimal set of nine object 
classes was specified in the data analysis plan. This set of object classes 
is as follows: 

1. Clouds 

2. Bodies of Water (Lakes, Oceans) 

3. Rivers 

4 . Snow 

5. Mountains 

6. Agriculture 

7. Plains 

8. Deserts 

9. Forests. 

These classes were based on coverage of the predominant objects 
encountered on earth survey missions rather than selection by the criteria 
of usefulness to current principal investigators. Classes were further chosen 
to span the range of source data activity, thereby serving to roughly 
bound the expected compressed data rates which could occur. 

In addition to the nine classes given above, the object classes of 
"cities" and "grassland" were added. In addition to the homogeneous object 
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classes, several additional composite classes were selected corresponding 
to class variants which are also commonly encountered and which would be 
expected to yield different data and compression statistics. These object 
classes and the basis for their selection are detailed below: 

• Coastline and Harbor - These features are encountered at 
the boundary between land and bodies of water. Since the 
data statistics and characteristics of these two features 
vary greatly, it is of interest to determine what effect 
the land-water interface has on the various compression 
algorithms, especially for the global Huffman codes which 
develop a code optimal for neither the water nor the land. 

• Island - Similar to the coastline class, this object is of 
primary interest in the determination of the time-varying 
bit rate and buffer statistics as the data activity varies 
from very low (ocean) to high (island). 

• Haze - The effect of haze obscuring the land features amounts 
to a decrease in contrast and data activity compared to the 
scene without haze. The predominant characteristic sought is 
the decrease in bit rate produced by such haze. 

• Scattered Clouds over Water - This class produces a wide 
distribution of difference intensities in the compressed 
symbols due to the large intensity steps between the bright 
clouds and dark water. Processing this class serves to evaluate 
the ability of the compression algorithms to hardle such symbol 
distributions. 

Certain classes subject to wide variations were processed as conditions 
charged. As an example, three variants of the object class containing 
mountains were selected; bare mountains, mountains with vegetation, and 
mountains with snow cover. As expected, the results differ substantially. 
Several scenes containing agriculture were processed, corresponding to 
differing crops and field sizes. 

The statistics and compressed rate can vary on any given object 
class and for any given location depending on varying factors such as the 
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time of year, sun angle, and cloud cover. While it was not possible to 
obtain results for all such variations within the scope of the current 
investigation, it is felt that the results which have been obtained are 
representative of each class and serve the objectives of the study. 

The object class subscenes were chosen to encompass an area of 
25 square miles since the chosen object classes can be located to span 
such an area and such a size yields statistically significant results. 

The 625 square mile full scenes were selected either to contain a suitable 
grouping of object classes or to ascertain whether the data and compression 
statistics of an object class covering such an area would deviate signifi- 
cantly from the statistics of a subscene containing that class. Emphasis 
was given to subscene processing since the degree of compression expected 
for a given full scene can be estimated by knowing the percentage occurrence 
of the various object classes in the full scene and the average compressed 
bit rate obtainable for the various object classes, as determined by sub- 
scene processing. Conversely, since full scenes normally contain several 
object classes, it is not possible to extrapolate the bit rate and statis- 
tics measured for the full scene to subscenes and object classes contained 
within it. 

After selection of object classes, tapes were selected from supplied 
imagery to include the various required subscenes and full scenes processed 
during the study. Each entry of Table 3.1 gives the scene identification number, 
object class description, location, earth coordinates, ERTS tape number, and the 
location (in pixels} of the upper lefthand corner of the scene processed. 

Note that scenes one through thirty are subscenes and numbers thirty-one 
through thirty-four are full scenes. These scene identification numbers will 
be used throughout this report. 
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Table 3.1. Parameters of Scenes Processed 


SCENE 
NO . 

CLASS DESCRIPTION 

LOCATION 

COORDINATES (N.W I 

TAPE NO. 

STRIP { STARTING PftfFfO 

1 

CLOUDS 

KANSAS 

(36 ' ,40" ; 101' ,10") 

1043-16573 

4 

(230,180) 

2 

BAY 

CHESAPEAKE BAY 

(38' ,05" ; 76' ,15") 

1062-15190 

4 

(1900,420) 

3 

LAKE 

LAKE MICHIGAN 

(43', 15"; 87’ ,15") 

1017-16093 

4 

(220,370) 

4 

OCEAN 

PACIFIC OCEAN 

(34' ,15"; 119' ,30") 

1018-18010 

1 

(1300,10) 

5 

LAKE 

LAKE ST. JOHN , CAN. 

(48 1 ,35"; 72', 00") 

1025-15103 

2 

(880,420) 

6 

SNOW 

VERMONT 

(46' ,10"; 73', 10") 

1170-15173 

4 

(400,420) 

7 

COASTLINE 

MASS, , N.H. 

(42', 45"; 70', 45") 

1167-15011 

3 

(1080,350) 

8 

CLOUDS (OVER FOREST) 

QUEBEC, CANADA 

(49', 20"; 70' ,30") 

1025-15103 

4 

(1800,301) 

9 

DESERT 

IMPERIAL VALLEY , CA. 

(32', 30"; 113', 50") 

1015-17440 

4 

(2050,600) 

10 

HARBOR 

LONG BEACH, CA. 

(34', 05"; 117', 30") 

1018-18010 

4 

(1850,20) 

11 

PLAINS 

TEXAS PANHANDLE 

(35', 40"; 103.' ,00'') 

1043-16573 

1 

(1700,540) 

12 

CLOUDS (OVER OCEAN) 

SOUTHERN CALIFORNIA 

(33', 45"; 119', 30") 

1018-18010 

1 

(2010,420) 

13 

FOOTHILLS 

SAN BERNARDINO, CA. 

(33 ' ,15" ; 113', 45") 

1106-17501 

4 

(1000,301) 

14 

FOREST 

VIRGINIA 

(38', 00"; 77', 30") 

1062-15193 

2 

(300,270) 

15 

ISLAND 

CATALINA , CA. 

(33', 25"; 118' ,30") 

1108-18020 

3 

(510,600) 

16 

HAZE (OVER MOUNTAINS) 

QUEBEC 

(45', 50"; 75', 20") 

1170-15173 

2 

(1000.440) 

17 

CITY 

LOS ANGELES, CA. 

(34 '',00" ; 118’, 10") 

1018-18010 

3 

(1700,600) 

18 

DESERT 

MOJAVE DESERT 

(34', 55"; 117', 55") 

1018-18010 

3 

(150,600) 

19 

(BARE) MOUNTAINS 

SAN BERNARDINO, CA. 

(33', 15"; 114', 25") 

1015-17440 

3 

(1300,10) 

20 

CITY 

CHICAGO 

(41', 50"; 87' ,40") 

1017-16093 

4 

(1800,500) 

21 

GRASSLAND 

NEBRASKA 

(41', 0"; 101' ,05") 

1007-16560 

2 

(5,50) 

22 

FOREST 

QUEBEC, CAN. 

(49', 0"; 72', 20") 

1025-15103 

1 

(500,610) 

23 

(RIVERS (IN FOREST) 

QUEBEC, CAN. 

(47', 50"; 73' ,05" ) 

1025-15103 

1 

(2095,150) 

24 

AGRICULTURE 

NORTH KANSAS 

(39', 50"; 101',25") 

1007-16560 

2 

(1800,500) 

25 

AGRICULTURE 

TEXAS/OKLA. 

(36', 20"; 103' ,05") 

1043-16573 

1 

(1000,100) 

26 

(VEGETATED) MOUNTAINS 

SOUTHERN CALIFORNIA. 

(34 ’,15"; 117 ',45") 

1018-18010 

4 

(1070,370) 

27 

AGRICULTURE 

ILLINOIS 

(41’, 50"; 88', 40") 

1017-16093 

2 

(1990,1) 

28 

(SNOW IN) MOUNTAINS 

LAKE TAHOE, CA. 

(38‘ ,35"; 119', 55") 

1128-18120 

3 

(1420,330) 

29 

FOREST 

WISCONSIN 

(43', 15"; 88', 36") 

1017-16093 

2 

(50,50) 

30 

AGRICULTURE 

IMPERIAL VALLEY, CA. 

(32 ‘ ,45"; 115', 30") 

1015-17440 

1 

(2000,100) 

31 

PRINCIPLE FULL SCENE 

IMPERIAL VALLEY, CA. 

(33', 40"; 114', 35”) 

1015-17440 

2 

(880,1) 

32 

FOREST 

QUEBEC, CANADA 

(49', 00"; 72', 00") 

1025-15103 

2 

(290,1) 

33 

MOUNTAINS 

SOUTHERN, CALIFORNIA 

(34', 35"; 119' ,00") 

1018-18010 

1 

(800,1) 

34 

DESERT 

MOJAVE DESERT 

(34', 50"; 118', 05") 

1018-18010 

4 

(4,1) 



3.3 ERTS IMAGERY CHARACTERISTICS AND STATISTICS 
3.3.1 Significance of Statistical Measurements Used 

The performance of data compression algorithms such as the SSDI 
and SHELL techniques used in this study is highly dependent on the 
characteristics of the source data. The compression achieved on a scene is 
proportional to the degree of spectral and spatial correlation existing in 
the data. For this reason, knowledge of the statistics of the multi spectral 
scanner data is important. 

Several data measurements were performed for all scenes evaluated. 
Initially the mean and variance of the data are obtained for each spectral 
band and subsequently averaged over. all bands. The mean intensity level of 
each spectral band level is averaged over the scene to indicate the average 
distribution of spectral energies in the scene. The variance of each band 
corresponds to the degree of data activity within that spectral band as 
averaged over the scene. 

Neither the mean nor the variance are accurate indicators of the 
compression that can be achieved for the scene, although a very low variance 
indicates high compression and a very high variance normally corresponds to 
a low degree of compression. Since the compression algorithms are essentially 
forms of differential pulse code modulation (DPCM), the effects of the 
differing spectral means are eliminated. Since the algorithms are based only 
on localized data activity it is possible for the intensity levels in each 
band to vary greatly over different portions of the global scene, yielding 
a large data variance, while varying slowly over local areas of the scene. 
Thus, overall variance of the intensity within the spectral bands is only 
a weak indicator of compression performance. 

The cross-spectral -spatial correlation is a more accurate indicator 
of the compression performance for a scene because it measures the localized 
changes in the spectral information. Each ground picture element corresponds 
to a four-dimensional intensity vector ^constructed from the intensity values 
of the four spectral components corresponding to that element. This vector 
moves through the spectral subspace having a unique location for each ground 
element. If the data activity is low in a given region, the vector movement 
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is small between adjacent pixels and, conversely, the vector can undergo 
large excursions between pixels in regions of high data activity. 

For the algorithms used in this study, compression is not only 
dependent on the magnitude and variability of the vectors from pixel 
to pixel but on the type of the movement. If the vector direction remains 
unchanged from pixel to pixel and only the magnitude varies, the compression 
can remain high since this corresponds to a high spectral correlation 
between pixels. Conversely, changes in vector direction with unchanged 
magnitude corresponds to low spectral correlation and henceforth less 
compression. 

The cross spectral -spatial correlation was developed to correspond 
to this dependence of the algorithm on the data activity. For each pair of 
intensity vectors I^and I. + . in the scene, where k is spatial separation of 
the elements, the normalized dot product is formed. Normalization removes 
the effect of scene illumination and the effects of magnitude changes while 
the dot product reflects the variation in vector direction between these 
elements. The closer this normalized dot product to unity, the greater the 
correlation between pairs and the better the expected performance of the 
algorithms. The curves generated for each scene reflect the average cor- 
relations obtained. The abscissas of the curves give the element spacing 
in pixels, the ordinates represent normalized cross spectral -spatial corre- 
lation, o^, and the curves give the percent of vector pairs in the scene which 
have a correlation greater than for the spacing k. 

The joint probability distribution function measures the joint occurrence 
of first difference values over the scene. A four-dimensional difference 
vector is formed by subtracting each pixel intensity from the intensity of 
the preceding pixel along the scan line. Vector ^corresponds to the first 
set of differences as obtained by the SSDI algorithm. Each difference vector 
corresponds to a point in a four-dimensional space. By counting the occurrence 
of these vectors over the scene and converting to a percentage occurrence, 
the clustering of the differentials can be observed. The location and degree 
of clustering of these difference vectors about the origin (0,0 ,0,0) gives 
another indication of achievable compression. Because a four-dimensional 
plot cannot be performed, only the two-dimensional projections of the probability 
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space is given and alim = 6 projections are printed by the computer. In 
order to produce an intelligible output, the occurrence of all zero difference 
values, the most probable occurrence, is normalized to the value ICO and all 
other probabilities are normalized to this value. Therefore, if the occurrence 
of al +1 difference values is assigned the value 60, this means that 
p ( 1 )/p(o,0) = .6. If a joint occurrence of a set of differences occurs 
less than 1% as often as does the joint occurrence of zeros, no value is 
printed for that location in order to produce a clean display. 

3.3.2 Results Obtained 

This study produced well over 1000 pages of computer printout. A 
complete set of output is given in Appendix B for the principal full scene. 

This section summarizes the various statistical measurements obtained for the 
thirty-four scenes by presenting results typical of the range of data generated, 
including anomalous data. 

Figure 3.3-1 gives the probability density of the data intensities 
present in each band of the principal full scene (number 31). The means 



Band 1 

Band 2 

Band 3 

Band 4 

Average 

Mean 

51.220 

57.761 

56.890 

23.880 

47.438 

Variance 

174.436 

273.124 

170.609 

43.519 

356.721 


Note that the average variance is not the average of the variances in 
each spectral band. This measure represents the overall variance of the data 
in all spectral bands based on the average mean of the four bands. It is 
normal for this variance to be greater than the variances of the individual 
bands, as in the case above. 

Note that the probability density function of the MSS data is not 
continuous as might be expected in spectral bands 1, 2, and 3, and contains 
intensity levels having much lower occurrence than adjacent levels. This 
anomaly arises from the decompression algorithms used for ground processing 
of the received data. The effect of this intensity mapping is quite evident 
in the probabilities of levels 56 through 64 of band 2. The distribution of 
intensities in band 4 (infrared) does not exhibit this peculiarity since no 
mapping is performed for that band. This mapping has an adverse effect on 
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Figure 3.3-1 


Probability of Occurrence of Intensity Levels in Each Band of 
Principal Full Scene (31) 
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the SSDI algorithms which are based on differences between successive 
intensity levels and the mapping algorithm introduces another source of 
noise into the data. The SSDIA and the SSDIAM algorithms tend to smooth 
these fluctuations due to the averaging operations in the SSDIA and the 
remapping ability of the SSDIAM. 

Figures 3.3-2 and 3.3-3 show the probability density of the data 
intensities for scene 28. Figure 3.3-2 is based on data received from 
the spacecraft while Figure 3.3-3 is based on the data after ground processing. 
The effect of the ground mapping algorithms is evident in a comparison of 
these two figures. 

A second anomaly present in several of the subscenes processed is 
due to the presence of a bad sensor in band 2 (MSS-5). This defect produces 
abnormally low intensity values on every sixth scan line of data in band 2. 
Figure 3.3-4 gives a segment of input data which clearly illustrates the 
occurrence beginning on the fourth scan line shown and recurring every sixth 
line. The effect of this anomalous data on the compression algorithms will 
be discussed further in Section 3.4.2. 

Figures 3.3-5 through 3.3-8 show several plots of cross spectral- 
spatial correlation for processed scenes. Figures 3.3-5 is from scene number 
4 (ocean) and shows highly correlated data since the spectral vector undergoes 
only minor changes in direction over the scene. As expected from such a high 
degree of correlation, a very low compressed bit rate was produced for this 
scene. Figure 3.3-6, from scene number 22 (forest) illustrates a low degree 
of correlation implying a large average change in vector direction for even 
closely spaced pixels. Figures 3.3-7 and 3.3-8 give the 95% correlation 
curves for several additional object classes. 

Similarly, Figures 3.3-9 through 3.3-11 show several joint probability 
distributions of first difference as obtained by the SSDI algorithm to 
illustrate several types of clustering which occurred. Figure 3.3-9 
illustrates a high degree of clustering of joint differences about (0,0) 
in spectral bands 1 and 2 , based on scene 4 (oceair) . Figure 3.3-10 
illustrates a wide spread of differences as produced by scene 23 
having higher data activity. Figure 3.3-11 shows an intermediate case 
from scene 29. 
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Figure 3.3-3. Probability of Occurrence of Intensity Levels in Each Band of Scene 28 
(Ground- Processed Data) 
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Figure 3.3-4. A Segment of Input Data from Scene 8 (Band 2) 
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Figure 3.3-7. Spectral -Special Correlations of 
Scenes 15, 21, and 32 (95% Curves 
Shown) 



Figure 3.3-8. 


Spectral -Spacial Correlations of Scenes 
7, 22, and 26 (95% Curves Shown) 
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Figure 3.3-9 . Joint Probability Density of Band 3 and Band 4, Scene 4 
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Figure 3,3-10* Joint Probability Density of Band 3 and Band 4, Scene 23 
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Joint Probability Density of Band 3 and Band 4, Scene 29 



An estimate of the total system noise energy was made based on three 
uniform subscenes. While the figures are necessarily an upper bound on the 
sum of sensor, quantization, and de-compression noises, they serve to 
approximate the minimum data activity produced the the ERTS system. Table 
3.2 lists the variances obtained for each spectral band based on the 
selected subscenes: 

Table 3.2. Estimated System Noise Energy 




Scene 1 

Band 

1 

.326 

Band 

2 

.343 

Band 

3 

1.350 

Band 

4 

.844 


Scene 3 

Scene 4 

.964 

.924 

.576 

.780 

.666 

.579 

.304 

.332 


The noise component produced by the scanner mechanism is a function 
of light intensity and scenes 3 and 4 are based on water which produces low 
incident light levels whereas scene 1 is based on cloud cover which entails 
a high intensity level. No estimates were taken for scenes of intermediate 
intensity since the data activity in such areas is much greater than the 
system noise level . 

3.4 DATA COMPRESSION CHARACTERISTICS AND STATISTICS 
3.4.1 Significance of the Compression Statistics Measured 

For each subscene and full scene processed, several key statistics 
and global measures of the data compression performance were obtained. 

These measurements serve as an indicator of the compression to be expected 
on similar subscenes containing the same object class and permit an evaluation 
of the efficacy of each compression technique for similar data. 

The probability distribution of the symbols obtained by each algorithm 
indicates the level of performance to be expected when using that algorithm 
on the data. The greater the clustering of these symbols about zero, the 
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the higher the compression when the symbols are encoded for the Rice or 
Huffman techniques. For most data, the variance of the data decreases 
progressing from the SSDI to the SSDIA and from the SSDIA to the SSDIAM 
algorithms. For anomalous data, such as that obtained with a defective 
sensor, this may not be the case. The entropy of the symbol distribution 
is also computed for each algorithm since the entropy forms a lower bound to 
the average compressed bit rate. 

Based on this symbol distribution, the computer generates the 
Huffman code for each technique. Since the symbol statistics are measured 
globally for the entire scene, this code would be used for non-adaptively 
encoding that scene. The grouped Huffman code, as described in Section 2.2.3 
is used in order to simulate a technique which has shown high promise for 
ground data compression usage. The codeword assigned to each symbol is 
printed together with the grouped codeword prefix beside those symbols which 
form the grouping. In addition, the number of bits in the codeword for each 
symbol, the total probability of symbols in the grouping, and the average 
compressed bit rate for the scene are displayed. The Huffman coding efficiency 
for each technique can be obtained by comparing the average bit rate with 
the symbol entropy. In addition to obtaining the global Huffman average bit 
rate for the scene based on each technique, the average bit rate is computed 
for the scene based on use of the adaptive Huffman or Rice algorithms for the 
encoding of either the SSDI or SSDIA symbols. Since the adaptive Huffman 
technique generates a new Huffman code for each block of data, no output of 
these codewords is practical. 

The percentage occurrence of the three Rice modes (FS, CFS, CFS) are 
printed as are the percentage occurrence of the split-pixel modes. The 
percentage of Rice modes which occur are dependent on the overall data 
activity within the scene as well as on local variations of data activity 
within the scene. Thus, even a scene which has an overall high data activity 
normally has some subregions with moderate or low activity. The split-pixel 
modes only occur when the data activity in a block produces an average bit 
rate greater than 4 bits/sample. 

The time-varying data compression of the scene is also printed for the 
global Huffman, adaptive Huffman, and Rice codes as the average bit rate per 
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scan line. These varying bit rates fluctuate corresponding to the 
average data activity in each scan line. Trends of data activity and 
the effects of sensor or system anomalies can be discerned by observing 
these statistics. In addition, the performance of the three techniques 
are compared on a line-by-line basis as well as by judging the overall 
compressed bit rate of each technique. While one of the three techniques 
will produce the lowest bit rate on most scan lines of a given scene there 
will be some lines in which a lower rate is produced by a competing 
technique. 

The buffer statistics are computed for the selected SSDI mode with 
Rice encoding for the two fixed-rate buffer outputs of 3.0 and 3.5 bits 
per sample. These rates were selected since they correspond to an average 
fixed-rate compression of 2:1 or more and they bound intermediate buffer 
output rates. The values printed for each scan line represent the total 
accumulated number of bits in the buffer at the end of the scan line, assuming 
zero buffer bits at the start of the first scan line. The buffer total in- 
creases when the average input bit rate for the scan line surpasses the fixed 
output rate and the total decreases when the average input bit rate is less 
than the output bit rate. Buffer underflow is printed as zero bits. These 
buffer statistics are primarily a consideration for compression performed 
aboard a spacecraft where the tradeoff of transmitted data rate versus buffer 
capacity must be considered. 

3.4.2 Results Obtained 

The overall data compression achieved on each scene processed is 
summarized in Table 3.3 for all the compression techniques used. The first 
column contains the scene identification number, correlated with Table 3.1. 
Columns six through nine give the average bit rate achieved on the scene by 
the Shell, SSDI, SSDIA, and SSDIAM (shown for single-level mapping, |m| = 1) 
algorithms followed by global Huffman coding. Columns two and three give the 
average bit rate for the scene as produced by the adaptive Huffman and Rice 
algorithms for symbols generated by either the SSDI or SSDIA (see column 5). 
Column 4 gives the peak buffer fill (in bits) generated by the scene for an 
output buffer rate of 3.5 bits per input intensity sample. Table 3.3 permits 
the determination of the expected bit rate that can be achieved on the various 
object classes using the different algorithms. The bit rate produced varies 
from a low of 1.220 bits to a high of 3.821 bits. 
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Table 3.3. Compressed Bit Rates of Scenes Processed 



ADAPTIVE 

ALGOR] 

[ T H M S 


G L 0 

B A L H 

U F F M A N 


SCENE 

NUMBER 

HUFFMAN 

RICE 

PEAK BUFFER 
FILL (SITS) 

s- 

SYMBOLS 

SHELL 

SSDI 

SSDIA 

SSOIAM 

1 

1.42 

1.31 

0 

SSDI 

1.697 

1.508 

1 .491 

1.427 

2 

1.45 

1 .22 

0 

SSDIA 

1.912 

1.920 

1.458 

1.380 

3 

1.49 

1.27 

0 

SSDIA 

2.037 

2.123 

1.532 

1.485 

4 

1.53 

1.37 

0 

SSDIA 

2.162 

1.953 

1.618 

1.461 

5 

1.59 

1 .73 

0 

SSDIA 

1.711 

1 .687 

1.934 

1.887 

6 

1.78 

1.79 

0 

SSDIA 

2.014 

2.135 

1.890 

1.453 

7 

2.39 

2.41 

0 

SSDIA 

2.386 

2.377 

2.478 

2.182 

8 

2.46 

2.66 

28,493 

SSDI 

2.842 

2.784 

3.089 

2.648 

9 

2.47 

2.48 

0 

SSDIA 

2.945 

2.750 

2.521 

2.069 

10 

2.61 

2.58 

0 

SSDIA 

3.043 

3.156 

2.762 

2.285 

11 

2.55 

2.67 

22,328 

SSDI 

2.827 

2.790 

3.251 

2.724 

12 

2.64 

2.67 

1,134 

SSDIA 

2.956 

3.106 

2.861 

2.559 

13 

2.71 

2.85 

1«1 

SSDIA 

3.160 

2.883 

3. 106 

2.693 

14 

2.81 

2.76 

0 

SSDIA 

3.233 

3.268 

3.096 

2.655 

15 

2.89 

2.48 

19,415 

SSDIA 

3.135 

3.374 

2.264 

2.767 

16 

2.91 

2.99 

1 ,227 

SSDIA 

2.915 

2.759 

3.351 

2.956 

17 

3.01 

2.98 

0 

SSDIA 

3.322 

3.399 

3.129 

2.522 

18 

3.07 

3.02 

2,411 

SSDIA 

3.206 

3.097 

3.167 

2. 558 

19 

3.11 

3.04 

0 

SSDIA 

3.438 

3.447 

3.185 

2.541 

20 

3.12 

3.21 

1,862 

SSDI 

3.436 

3.346 

3.556 

2.939 

21 

3.12 

3.19 

9,801 

SSDI 

3.349 

3.435 

3.433 

3.064 

22 

3.18 

3.21 

3,053 

SSDI 

3.374 

3.394 

3.408 

2.877 

23 

3.26 

3.28 

1,213 

SSDIA 

3.644 

3.467 

3.452 

3.034 

24 

3.32 

3.35 

3,670 

SSDI 

3.581 

3.536 

3.487 

2.981 

25 

3.36 

3.38 

88,775 

SSDIA 

3.407 

3.283 

3.667 

3.272 

26 

3.37 

3.35 

3,091 

SSDI 

3.526 

3.650 

3.355 

2.824 

27 

3.39 

3.40 . 

3,355 

SSDIA 

3.816 

3.699 

3.559 

2.935 

28 

3.45 

3.44 

5,734 

SSDI 

3.654 

3.698 

3.533 

2.907 

29 

3.48 

3.47 

3,872 

SSDIA 

3.512 

3.479 

3.651 

3.064 

30 

3.56 

3.54 

9,815 

SSDIA 

3.757 

3.747 

3.660 

2.981 

31 

3.47 

3.59 

168,599 

SSDI 

3.739 

3.686 

3.676 

3.046 

32 

2.93 

3.00 

6,649 

SSDIA 

3.239 

3.339 

3.323 

2.966 

33 

3.52 

3.53 

168,450 

SSDIA 

3.715 

3.821 

3.633 

3.025 

34 

2.68 

3.04 

0 

SSDI 

3.091 

3.044 

3.261 

2.827 



In general, the Shell and SSDI algorithms gave comparable compressed 
bit rates as did the adaptive Huffman and Rice techniques. For all scenes 
the SSDIAM produced a lower bit rate than the SSDIA. The difference in bit 
rates produced between the SSDI and SSDIA algorithm varies. For well- 
behaved data the SSDIA produces a lower bit rate than the SSDI but the 
reverse occurs for data in which the band 2 sensor generates anomalous data. 
This situation results from the averaging operation performed which includes 
intensities from a scan line of correct data and intensities from the scan 
line containing bad data. This produces large first differences within 
spectral band 2 and second differences which are not correlated with either 
band 1 or band 3. This effect produces disturbances in the SSDIA extending 
over 2 scan lines of compressed data. Each time a defective scan line occurs 
resulting in poor compression for one third of the scene. This averaging 
operation is also performed for the SSDIAM to produce a high compressed bit 
rate for such anomalous data. This effect is evident from the time-varying 
compressed bit rate shown in Figure 3.4-1. 

The effect of such data anomalies is less severe for the SSDI 
algorithm but some degree of degradation is still produced with the severity 
depending on the form of source coding which is used for the SSDI symbol. 

The global Huffman code becomes less efficient because one sixth of the 
scan lines (those containing anomalous data! produce symbol statistics 
quite different than the statistics for the other lines. The Huffman code 
generated based on the symbol statistics for the entire scene is neither 
optimal for the normal data nor for the anomalous data symbols. This same 
variation in line symbol statistics corrupts one third of the scan lines 
when using the adaptive Huffman code which uses the statistics developed 
for the symbols on one scan line for encoding symbols from the next line 
of data, a process which requires a fairly high correlation of symbol 
statistics from line-to-line to be effective. The Rice technique normally 
performs better than the global or adaptive Huffman methods for such data 
since it generates a code based solely on the statistics of a block of 
symbols contained within a single scan line. This ability to rapidly adjust 
to changing statistics is advantageous for segments of defective data. 
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Figure 3.4-1: Global Huffman Code for Shell Symbols, Scene 3 


Appendix B contains a complete set of computer output for the principal 
full scene, and contains the global Huffman code generated for the Shell, 

SSDI , SSDIA, and SSDIAM symbols. Figures 3.4-1 through 3.4-3 present 
global Huffman codes which illustrate typical forms such codes can assume. 
Figure 3.4-1 shows the Huffman code generated coding for the Shell symbols of 
scene 3, where shell 1 corresponds to the inner shell with maximum level of 
zero. The x 2 -distribution peaks at the second shell, and no levels are 
occupied beyond the fifth. 
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Figure 3.4-2. Global Huffman Code for SSDI 
Symbols, Scene 12 
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Figure 3.4-3. Global Huffman Code for SSDIAM 
Symbols, Scene 3 
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Figure 3.4-2 presents a typical SSDI global Huffman code for a source 
symbol distribution based on data having a moderately high activity. The 
first column gives the symbol level, (only symbols between -20 and +20 are 
shown since symbols outside this range are always in the lumped group.) 

The second column gives the probability of occurrence of that symbol, the 
third column gives the number of bits in the symbol codeword, and the last 
column gives the actual codeword generated. Symbols having an asterisk by 
the length indicate that the symbol forms a part of the lumped grouping. 

In this example, the lumped codeword prefix is 1100, of length four bits. 

The total probability of symbols included in the lumped grouping is .033 
for this example. The average codeword length for this scene is 3.106 
bits/sample and the symbol entropy is 2.951 bits/sample. 

The global Huffman code for the SSDI AM symbols generated for scene 
3 is shown in Figure 3.4-3. This code is typical of those generated for 
data having a low source activity and only three symbols, of total probabi- 
lity is .994, are encoded directly with the remainder falling in the lumped 
group with prefix codeword 110. Since no Huffman codeword contains less 
than one bit, the ratio of the average codeword length to the symbol entropy 
increases as the entropy decreases. This is evident in Figures 3.4-2 and 
3.4-3 where the respective ratios are 1.053 and 1.2. 

Figures 3.4-4 and 3.4-5 contain segments of time-varying compression 
and buffer statistics for scenes of high and low data activity. These two 
figures convey a great deal of information regarding data characteristics and 
permit comparisons of the performance and limitations of the various 
algorithms. The first column of Figure 3.4-4 gives the scan line numbers 
(from the beginning of the subscene), columns 2 and 3 gives the buffer 
fill (in bits) at the end of each scan line based on fixed output rates 
of 3.0 and 3.5 bits/sample; columns 4 though 6 give the average bit rate 
for that line based on the global Huffman, adaptive Huffman, and the Rice 
algorithms respectively. For this Figure, SSDI symbols are used and the 
buffer statistics are based on the global Huffman output of scene 12. 

Several observations can be made concerning Figure 3.4-4. First, 
the source data activity increases from line 36 to about line 61 and 
then begins to decrease. As the average bit rate per line for column 4 
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increases above the buffer output rates, the buffer fill increases until 
the rates of column 4 fall below the buffer rates. The peak buffer fill 
for an output rate of 3 bits occurs on line 67 while the peak buffer fill 
for an output rate of 3.5 bits occurs on line 62. 
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Figure 3.4-4. Time-Varying Compression Statistics, Scene 12 
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Figure 3.4-5. Time-Varying Compression Statistics, Scene 3 

On lines of lower data activity the Rice algorithm normally yields 
the lowest average bit rate. One reason for this is that the Rice al- 
gorithm can produce a bit rate as low as .37 bits/sample while the Huffman 
codes can never be less than 1 bit/sample. As the average bit rate increases 
the Huffman codes can perform better than the Rice in some cases since they 
do not require the transmission of overhead bits for each block of data. 
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Another limitation of the Rice algorithm, as simulated here, is the use of 
fixed codes for the fundamental sequence and these fixed codes are not 
optimal for all blocks. The performance of the global Huffman code depends 
on how well the symbol statistics of a scan line match the symbol statistics 
developed for the entire scene and the performance of the adaptive Huffman 
code is dependent on the correspond of the symbol statistics of the line 
being encoded with the symbol statistics of the preceding line. Therefore, 
while the adaptive algorithms yield performance superior to the fixed Huffman 
algorithm for the entire scene, this is not necessarily true on the basis of 
any individual scan line. As an example, the global Huffman is best for 
line 52, the adaptive Huffman is best for line 41, and the Rice technique is 
best for line 36. 

Figure 3.4-5 portrays similar time-varying statistics based on 
SSDIA symbols for a segment of data from scene 3 having low source activity. 
This scene also differs from the preceding in that the data contains 
anomalous sensor output on every sixth scan line. The effect of this 
sensor defect disturbs the various algorithms in differing degrees as 
can be observed in the performance shown in the figure. Since the Rice 
algorithm encodes each line separately, the increased bit rates on lines 
37, 38, 43, 44, etc., reflect the disturbance of the SSDIA symbols 
themselves. Since the adaptive Huffman algorithm uses the statistics of 
symbols in one line for encoding the next line, the anomalous data produces 
effects that propogate further. The global Huffman is an intermediate case 
since each line is encoded based on the code developed for the entire scene. 
This global code is affected by the defective sensor data statistics and 
produces a code which generates a higher bit rate for all scan lines. 

Figure 3.4-5 also shows lines for which the Rice algorithm produces average 
bit rates of less than one bit/sample including overhead bits. 

Figures 3.4-6 through 3.4-9 show buffer statistics for two scenes 
based on the SSDIA/Rice algorithm. Figures 3.4-6 and 3.4-7 are based on 
scene 15 which is centered on Catalina Island and extends into the Pacific 
ocean on either side. This scene produces an average bit rate of 2.89 bits. 
Figure 3.4-6, based on a buffer output rate of 3.5 bits, shows buffer under- 
flow until scan line 48 , at which point the average compressed bit rate 
exceeds 3,5 bits per line. The bit rate per line begins to fall below the 
buffer output rate around scan line 107 at which point the buffer fill 
decreases. 
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Figure 3.4-6. Buffer Statistics of Scene 15 (Catalina Island) 



Figure 3.4-7. Buffer Statistics of Scene 15 (Catalina Island) 
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BITS IN BUFFER 




Figure 3.4-7, based on a buffer output rate of 3 bits, illustrates 
the same effect except that buffer fill begins around scan line 29 where 
the compressed bit rate exceeds 3 bits/line and peaks at scan line 110. 

For an output buffer rate of 3 bits the peak buffer fill is 43,632 bits 
and for an output buffer rate of 3.5 bits the peak buffer fill is 19,415 bits. 

Figures 3.4-8 and 3.4-9 are based on scene 23 which contains sensor 
data anomalies in spectral band 2. This scene produces an average compressed 
data rate of 3.28 bits. Figure 3.4-8 dramatically illustrates the effects 
produced by the defective sensor in band 2 as peaks in the buffer fill 
occurring every sixth scan line. If the sensor defect were not present 
the average compressed rate would be around 2.85 bits, well below the 
buffer output rate. Due to this defect the peak compressed data rate 
exceeds 4 bits on many of the affected scan lines. The peak buffer fill 
is 1213 bits. Figure 3.4-9 illustrates the use of a buffer with output 
rate (3 bits) below the average compressed data rate. The buffer fill 
continues until it reaches 32,003 bits at the end of the scene. The effect 
of the sensor anomaly is also evident in Figure 3.4-9 as peaks and valleys 
superimposed on the buffer statistics. 

The Rice algorithm uses a fixed block size of 16 pixels (64 symbols) 
along a scan line for all scenes processed. This block size was determined 
early in the investigation by processing segments of three subscenes 
(numbers 3, 10, and 20) with varying block sizes. The results are given 
in Figure 3.4-10. While the performance of the Rice algorithm is dependent 
on the block size used, and ideally this parameter should be adaptive, com- 
pression is only weakly dependent on block size over the range of twelve to 
twenty pixels per block. As the block size decreases below this range the 
contribution of the overhead bits becomes a significant percentage of the 
overall bit rate. If the block length is too large, the symbol statistics 
can vary significantly over the block and degrade the compression achieved. 
The block size of 16 pixels was selected as a compromise between these two 
conflicting requirements and it is felt that any degradation that results 
fom this choice is not severe. 
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Figure 3.4-10. Performance of Rice Algorithm with 
Variation of Block Size 


Table 3.4 shows the precentage distribution of Rice modes for 
several scene processed, reflecting the data activity within that scene. 

For scenes having low data entropy, mode CFS predominates; for scenes having 
moderate data activity, mode FS predominates; and for scenes of high data 
activity mode CFS predominates. The only split-pixel mode which appeared 
in the scenes processed is the (6,1) mode. This is to be expected since 
the other split-pixel modes appear only when the symbol entropy exceeds 
five bits. 

Table 3.4. Percentage Distribution of Rice and Split-Pixel Modes 
for Selected Scenes with Varying Data Activity 
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3.5 COMPARISON OF SPACECRAFT AND GROUND PROCESSED DATA 

Scene number 28, containing mountains, was run using both MSS tapes 
before and after ground processing. The former tape contains data 
generated directly by the spacecraft and permits a comparison of the 
data statistics and compression algorithm performance with the ground pro- 
cessed tapes which were used throughout the study. 

The means of the data differ slightly and are: 



Band 1 

Band 2 

Band 3 

Band 4 

Average 

Spacecraft 

32.86 

31.00 

39.74 

20.27 

30.97 

Ground 

34.96 

33.50 

38.81 

19.28 

31.64 


This difference is due to the decompression performed on the ground. 
Figure 3.5-1 contrasts the cross spectral -spatial correlations of the two 
forms of data and shows a higher correlation for the spacecraft data. 
Figures 3.5-2 and 3.5-3 show corresponding plots of the joint probability 
function of first differences from bands 3 and 4, illustrating a somewhat 
closer clustering of differences about the origin. 



Figure 3.5-1. Spectral -Spacial Correlation of 
Scene 28 Based on Spacecraft (S) 
and Ground-Processed (G) Data 
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Figure 3.5-2. Joint Probability Density of Band 3 and Band 4, Scene 28 (Spacecraft Data) 
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Figure 3.5-3. Joint Probability Density of Band 3 and Band 4, Scene 28 (Ground-Processed Data) 



The most interesting comparison is based on the compressed bit 
rate achieved by each algorithm on the two forms of MSS data: 



Spacecraft Data 

Ground Processed Data 

SHELL 

3.512 

3.654 

SSDI 

3.566 

3.698 

SSDIA 

3.448 

3.533 

SSDIAM 

2.927 

2.907 

SSDI/Adaptive Huff. 

3.33 

3.45 

SSDI/Ri ce 

3.32 

3.44 


The compression achieved for. spacecraft data was higher for all 
techniques except for the SSDIAM. While the discrepancy for SSDIAM 
processing is not fully understood, the improvement achieved with ground 
processed tapes may result from the compensating interaction between the 
SSDIAM intensity mapping and the intensity mapping performed during ground 
processing. The buffering statistics are correlated with the difference 
in compressed bit rates for the two forms of data, with peak buffer fill 
tabulated below. 

Output Buffer Rate Spacecraft Data Ground Processed Data 

3.0 36,820 49,875 

3.5 4,188 5,734 

3.6 RECONSTRUCTED IMAGERY 

Four full scenes were compressed and reconstructed using a variety 
of strictly information preserving algorithms and the Principal Full Scene 
(PFS) , number 31, was compressed and reconstructed to evaluate the effects 
of channel errors and essentially infomration preserving distortion on the 
reconstructed imagery. Photographs PI through PI 7, contained in Appendix A, 
were generated from the reconstructed data. 

The photographic imagery was reconstructed from tapes formatted as 
shown in Figure 2.4-5. Several precautions were taken to insure uniformity 
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of film and print processing. All four spectral bands of a given full 
scene were placed on the same film negative and developed simultaneously. 

All prints were developed at the same time as were the enlargements of 
individual spectral bands to prevent gross variations in contrast and to 
insure a uniform enlarging size. 

Photographs PI through P4 give the four spectral bands of the four 
full scenes compressed and reconstructed using various combinations of 
strictly information preserving algorithms evaluated in this investigation. 
Image PI contains the scene, number 31, containing a variety of object 
classes including agriculture, mountains, barren soil, and a river. An 
airport is located in the top section of the images, on the left side of 
the agricultural area. This scene was selected as the principal full 
scene (PFS) since it does contain such a variety of object classes and 
areas of varying degrees of data activity. 

Image P2 is based on scene number 32 containing part of Lake St. 

John, the Saguenay river, forest, and the city of Alona (Quebec), Canada. 
Image P3, from scene number 33, predominantly contains coastal vegetated 
mountains of Southern California interspersed with small lakes. Image P4, 
from scene number 34, contains the Mojave desert and scattered irrigated 
agricultural areas. 

Image PI was compressed and reconstructed using the SSDI/Rice 
algorithm, image P2 used the SSDIA/Rice, image P3 used the SSDI A/Huffman, 
and image P4 used the SSDI/Huffman. This selection of compression algorithms 
effectively illustrates the capability of all four combinations of tech- 
niques to provide the strictly information preserving compression and 
reconstruction' of MSS data. 

Photographs P5 through P8 contain enlargements of the individual 
spectral bands of the principal full scene. These enlargements are based 
on the original MSS tape data corresponding to the full scene processed by 
the various algorithms and provide a basis for comparison of the original 
imagery with the processed and reconstructed imagery contained in 
photographs PI and P9 through PI 7 . 
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Photographs P9 through PI 6 contain the individual spectral bands 
of the principal full scene after compression and reconstruction by the 
essentially information preserving SSDIAM/Rice algorithm'. For photographs 
P9 through P12, a mapping of jmj = 1 was used and each reconstructed 
intensity level is either the same as the original level or deviates from 
the original by +1 or-1 intensity levels. A mapping of jmj = 3 was used 
for photographs PI 3 through PI 6 allowing each reconstructed intensity level 
to deviate as much as three intensity levels from the original sample level. 

The strictly information preserving compression of the scene produced 
an average bit rate of 3.59 bits/sample and no distortion. With |m|= 1 
the average compressed bit rate was 2.71 bits/sample and a mean square 
distortion of 0.0112 percent. With | m | =3 , the average compressed bit rate 
was 1.91 bits/sample and a mean square distortion of 0,103 percent. Figures 
3.6-1 and 3.6-2 give error plots showing the magnitude of the deviation 
between the original and reconstructed intensity levels for a portion of 
band 3 of the principal full scene. Figure 3.6-1 is based on the mapping 
jmj=l and Figure 3.6-2 is based on the mapping jm(=3 

The photographs show that no noticeable visual degradation results 
from the use of mapping |m(=l since this level is comparable to the system 
noise including that introduced by the ground processing algorithm. A mapping 
of jm| = 3 does introduce visual degradation especially in areas of uniform 
intensity such as the valley floor to the left of the agricultural area. This 
contouring is most noticeable in spectral band 4 (infrared). Fortunately, 
the algorithm tends to reproduce areas of medium and high data activity well 
without severely degrading edges or introducing slope overload and overshoot 
as often occurs in delta modulation algorithms. 

Photograph P17 illustrates the effects of channel errors on -the per- 
formance of the strictly information preserving SSDI/Rice algorithm. Photo- 
graph PI 7 is based on a bit error rate of 10“ 5 . This value was used since 
it is felt that channels used for the transmission of ERTS data will have 

an error rate better than 10" 5 . Errors were simulated on the computer 

-5 

have an error rate better than 10 . Errors were simulated on the computer 

by changing bit values in the compressed data bit stream at the appropriate 
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Figure 3.6-1. Error in Levels Between Original and Reconstructed Samples 
in Band 3 of Scene 31 {(m|-|) 
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Figure 3.6-2. Error in Levels Between Original and Reconstructed Samples 
in Band 3 of Scene 31 (|m|=3) 



error rate. Such an error affects the reconstruction algorithm and 
produces anomalous reconstructed data values until the next memory update 
point where the algorithm again synchronizes with the compressed data 
and generates correct data until the next error occurs i The current 
simulation only provides such a memory update at the beginning of each 
scan line but the algorithm can be easily modified to produce more 
frequent updates. 

3.7 IMPLEMENTATION OF THE SSDI/RICE ALGORITHM 

An investigation of the implementation of the SSDI/Rice algorithm 
was conducted concurrently with the simulation activities to determine 
the logical data flow for such a system and estimates of hardware com- 
plexity. Although the baseline system developed could not be optimized 
within the scope of the present study, the investigation performed serves 
to provide a basis for further study and refinement and permits tradeoffs 
pertinent to decisions involving the use of such a data compression system 
on board future earth observation satellites. 

The SSDI/Rice Data Compression Unit (DCU) functional block diagram is 
given in Figure 3.7-1. A flow chart of the DCU information transfer and 
processing is illustrated in Figures 3.7-2 and 3.7-3. Figure 3.7-4 through 
Figure 3.7-7 depict the basic hardware components necessary to implement the 
SSDI and RICE algorithms. The data flow begins in Figure 3.7-4 and continues 
to Figure 3.7-7 with the functional blocks listed identifying the rela- 
tionship to the block diagram in Figure 3.7-1. The notation used in labeling 
the components is listed in Figure 3.7-4. The baseline implementation 
utilizes SSI and MSI Low-Power Schottky TTL (SN54LSXXX) combined with bi- 
polar 1024 x 1 RAM's. Custom LSI (TRW Emitter Follower Logic) was investi- 
gated for repeated functions such as the ALU/Latch. The estimated impact 
was principally in a reduction of total parts and power of 10% and 0.8W 
respectively as compared to the baseline. Since the development cost of 
the LSI chip is sizable and the net effect on the DCU is minor, LSI was not 
included in the baseline design. 
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The baseline DCU without a power converter is summarized below: 

• Data clock rate 5 MHZ (equivalent data rate 35M B/S) 

• DCU Parts Total: 273 IC, 60 Discretes or 323 total 

• Power: 26.7 watts 

• Weight: 3.0 pounds 

• Volume: 192 inches^ (6" x 8" x 4"). 

The maximum clock rate capability is 10 MHz for the given design. The 
rate can be further increased to 15 MHz maximum if selected components are 
substituted with standard Schottky TTL MSI and speed enhancement IC's used 
for Look Ahead Carry Generators for the Arithmetic/Logic Units (ALU's). 
There is then a penalty in additional parts and power dissipation with a 
corresponding slight increase in packaging. 
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The unit characteristics for the baseline configuration is tabulated 
in Table 3.4, for three types of units; I for 10 MHz, II for 15 MHz, and 
III for 25 MHz. Type III was sized using Schottky TTL with Emitter - coupled 
Logic ( ECL ) at critical points. The resulting characteristics, especially 
power, are very large and are shown for comparative purposes. An optional 
power converter is also listed in the same terms as the DCU and would be 
directly additive to the associated DCU type. Example: Power (Type I) = 

26.7 + 9 = 35.7 watts total. 

While Table 3.5 shows three technologies providing increasing 
input data rate capabilities of up to 175 Megabits per second, a severe 
penalty in power, weight, and volume is imposed by a type III implementation. 
If the system must run at data rates higher than 105 Megabits per second 
a different form of the SSDI/Rice should be developed so that certain system 
blocks process separate blocks of input data in a parallel fashion, later 
reassembling the data into the appropriate serial bit stream. The output 
data buffer size is a function of the buffer output bit rate and should 
ideally adapt its parameters to the time-varying compression statistics. 

A number of developing technologies, such as bubble and CCD memories could 
permit the use of buffers having millions of bits of storage and occupying 
a relatively small volume. 

Table 3.5. DCU Unit Parameters 


TYPE 

INPUT 

CLOCK RATE 
• (MHZ) 

INPUT 
DATA RATE 
(MBPS) 

PARTS 

IC 

DISC 

TOTAL 

POWER 

(WATTS) 

NUMBER 

OF 

SLICES 

WEIGHT 

(LBS) 

VOLUME 

(IN 3 ) 

I 

5 

35 

263 

60 

323 

26.7 

2 

3.0 

192 

I 

10 

70 

263 

60 

323 

26.7 

2 

3.0 

192 

II 

15 

105 

290 

60 

350 

30.7 

2-1/2 

3.7 

192 

III 

25 

175 

333 

lbO 

493 

108.0 

3 

4.5 

384 


POWER CONVERTERS 


TYPE 

PARTS 
IC DISC 

TOTAL 

POWER 

(WATTS) 

NO. OF 
SLICES 

WEIGHT 

(LBS) 

VOLUME 

(IN 3 ) 

I 

10 240 

250 

9.0 

1 

1.5 

96 

II 

10 240 

250 

10.0 

1 

1.5 

96 

III 

20 250 

270 

36.0 

1 

3.0 

128 


Notes: 1, Slice form factor 6" x 8" x 2" 

2. Power converter is optional, hence, add as necessary. 
Assumed efficiency if 75% 
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4. SUMMARY AND CONCLUSIONS OF THE INVESTIGATION 


This section summarizes and discusses the major results of this 
investigation and the relevance of the study to the ERTS program. The 
intent of this section is to provide information which can assist in the 
planning of future ERTS missions relative to the use of data compression 
either on board the satellite or for ground-based applications. 

4.1 DISCUSSION OF RESULTS 

The information obtained by the processing of thirty-four ERTS MSS 
scenes permits the drawing of certain conclusions regarding the relative performani 
of the several data compression algorithms evaluated. Since the results ob- 
tained are based on the necessarily finite amount of ERTS scenes used, the 
conclusions reached are strictly valid only for this data but are representa- 
tive of the level of performance to be expected for similar data. For example, 
if a. given object class subscene yielded a compressed bit rate of 2.5 bits/sample, 
other subscenes containing this class should produce similar results. The 
variance of data statistics should also be minimal for similar object classes, 
assuming that the data does not contain sensor or processing anomalies such as 
that produced by the defective sensor in spectral band 2. 

4.1.1 Data Statistics and Compression Performance 

The MSS data used was based on ground processed tapes and one 
unprocessed spacecraft tape. The ground processing includes a decompression 
algorithm which performs a mapping of intensity levels in bands one through 
three. The effects of the mapping can be seen from a comparison of the 
probability density of the intensity levels taken from a ground processed 
tape, Figure 3.3-3, and the corresponding distribution of intensities from the 
same segment of data before ground processing. Figure 3.3-2. The processed 
tape shows a redistribution of intensities with a corresponding omission of 
certain intensity levels present in the spacecraft data. The effect of ground 
processing on the performance of the compression algorithms will be discussed 
later in this section. 
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Some of the scenes processed reveal the presence of anomalous 
sensor data in spectral band 2. As shown in a segment of data values in 
band 2 of scene 8, Figure 3.3-4, the anomalous data occurs every sixth scan 
line when present. Such defective data is uncorrelated with the data in the 
adjacent scan lines and with the equivalent scan lines in the other spectral 
bands, producing a degradation in the performance of the data compression 
algorithms which require a high degree of spatial and spectral correlation 
in order to obtain good compression. 

For data tested, not containing this anomalous data, the measured 
data statistics were fairly accurate indicators of data compression performance, 
except for the variances of data intensities. This variance measures the 
global variation in intensity over the scene, whereas the performance of the 
compression algorithms is dependent on local variations. Therefore, a scene 
for which the intensity levels are very uniform over local areas while the 
mean intensity changes dramatically over large regions can exhibit both a 
high variance and a low compressed bit rate. Conversely, a scene which 
exhibits a low global data variance in all spectral bands produces a low 
compressed bit rate. The joint probability distribution function of first 
difference values is a better indicator of the degree of compression possible 
for a scene, with compression increasing as the clustering of joint differences 
tightens around the origin. The cross spectral -spatial correlation is a 
valid indicator of the average spectral correlation of the scene over local 
regions. The less rapid the fall-off and the closer the spacing of the 90%, 

95%, and 99% correlation curves, the higher the compression. 

Table 4.1 summarizes the average compression obtained with each 
algorithm for the uniform object classes evaluated. For multiple scenes 
containing the same object class, the range of compression results falls 
within ten percent of the average value, except for the object class labeled 
forests where the compression deviates up to thirteen percent from the median 
value. Such differences reflect variations of location, sun angle, and time 
of year among the scenes processed. The compressed bit rates generally reflect 
the level of activity in the scene, progressing from the low data activity seen 
over large bodies of water to the high activity occurring from field-to-field 
in agricultural areas. 
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Table 4.1. Average Compressed Bit Rates 
(Based on Scenes Processed) 


OBJECT CLASS 

AD. HUFF. 

RICE 

SHELL 

CLOUDS 

1.42 

1.31 

1.697 

WATER 

1.51 

1.40 

1.956 

SNOW 

1.78 

1.79 

2.014 

PLAINS 

2.55 

2.67 

2.827 

DESERT 

2.74 

2.84 

3.081 

MOUNTAINS 

3.23 

3.24 

3.403 

CITY 

3.06 

3.09 

3.379 

FOREST 

3.10 

3.11 

3.339 

GRASSLAND 

3.32 

3.35 

3.581 

AGRICULTURE 

3.41 

3.42 

3.640 


for Uniform Object Classes 
GLOBAL HUFFMAN 


SSDI 

S.SDIA 

SSDI AM 

1.508 

1.491 

1.427 

1.921 

1.635 

1.550 

2.135 ■ 

1.890 

1.453 

2.790 

3.251 

2.724 

2.964 

2.983 

2.484 

3.500 

3.362 

2.798 

3.373 

3.343 

2.731 

3.370 

3.368 

2.891 

3.536 

3.487 

2.981 


3.566 


3.593 


3.042 



The entries in Table 4.1 can be used as a guide for determining the 
expected level of compression for another scene containing the same object 
class. The compression of a full scene of data can be roughly estimated by 
weighing the averaged compressed bit rate of each object class contained in 
the scene by the percentage occurrence of the class over the scene. The 
resulting rate should be fairly close, especially if the adaptive Huffman 
or Rice algorithms are used. The estimate could be too low for the global 
Huffman algorithm if the bit rates of the object classes in the scene sub- 
stantially differ, as in the case of a large lake surrounded by agriculture 
or forest, since the code developed for the entire scene may be a poor match 
to local areas. The estimate should also be tempered by the presence of haze, 
smog, or clouds. Haze and smog, improve compression by lessening the apparent 
data activity as seen by the satellite whereas small broken clouds can produce 
large intensity jumps at their periphery, decreasing compression. Large cloud 
cover, however, can be considered as a separate uniform object class when 
performing such estimates. 

Based on the scenes processed, the average bit rates for the various 
compression techniques are as follows 


Average bit rate 


Global Huffman 


SHELL 

SSDI 

SSDIA 

SSDIAM 

Adaptive 

Huffman 

Rice 

2.99 

2.98 

2.92 

2.50 

2.67 

2.70 


Based on the actual compressed tapes generated for the 25 x 25 n mi 
full scenes, a compression ratio of 2:1 produces a compressed bit stream that 
fills 7.1 percent of a standard reel of magnetic tape. This result and the 
average bit rates achieved for the scenes processed indicate that almost all 
100 x 100 n.mi scenes could be compressed to occupy a single reel of tape as 
opposed to the four reels now required. This result can be of great economic 
benefit to NASA for storage, transmittal, and achieving of ERTS imagery data. 
With an essentially information preserving algorithm such as the SSDIAM an even 
greater compaction of the data would result. 

The above measures of data compression performance and compressed bit 
rate are based on the use of ground processed tapes. One tape of spacecraft 
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data, before ground processing, was received and evaluated. The results of 
processing a given subscene of this spacecraft data and the same subscene 
after ground processing indicated a lower compressed bit rate with the 
spacecraft data for all the strictly information preserving data compression 
techniques used. This decrease in bit rate is probably due to the higher 
correlation of the spacecraft data which does not contain the deleterious 
effects of the ground decompression mapping of intensity levels. 

It is not possible to decisively conclude that all spacecraft data 
will produce a lower compressed bit rate than the corresponding ground processed 
data on the basis of the single spacecraft tape processed. However, since the 
bit rates differed by only a few percent, it is reasonable to conclude that 
the results achieved by the processing of ground processed tapes during the 
investigation are applicable to spacecraft data with only minor changes in 
the bit rates achieved with each source of data. 

4.1.2 Effects of Distortion and Channel Errors 

The SSDIAM/Rice algorithm was used on the principal full scene to 
evaluate the effects of essentially information preserving photographs P9 
through PI 6 of Appendix A. The results indicate that a substantial decrease 
in bit rate can be achieved by this form of data compression without introducing 
a severe distortion of the reconstructed image. For the example illustrated, 
the strictly information preserving compressed bit rate was 33% larger than 
the SSDIAM bit rate for mapping |m| = 1 and about 90% larger than the 
SSDIAM bit rate for mapping |m| = 3. The distortion introduced for mapping 
|m| = 1 cannot be discerned visually when compared with the original imagery 
and the effect on the data is comparable to that introduced by decompression 
during ground processing. The distortion produced by the mapping |m| = 3 can 
be seen especially in areas having a relatively uniform intensity where the 
effect is similar to the contouring often seen in imagery compressed by delta 
modulation techniques. The effects of distortion are less evident in areas of 
high data activity and no slope overload or overshoot effects are produced by 
the SSDIAM algorithm. 

An adaptive form of the SSDIAM is suggested based on the results of 
this investigation. In this study a fixed mapping block size of 8 pixels 
was used. Ideally, this block size would vary with data activity as discussed 
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in Section 2.1.4. In addition, mapping level m would vary adaptively as a 
function of both the data activity and mean block intensity level to compen- 
sate for the logarithmic response of the eye. Note that these adaptive block 
techniques only add complexity to the compression algorithm. The reconstruction 
algorithm is the same whether the compression is strictly or essentially 
information preserving. While an optimal form of the SSDIAM could produce 
reconstructed data useful for the majority of investigations involving 
visual interpretation of ERTS imagery, the effects of the distortion on 
experiments involving computer processing of the data are unknown at present. 

Due to the mechanization of the SSDIAM, however, in which the block spectral 
means remain unchanged by the processing involved, the effects of the mappings 
may not severely degrade the accuracy of algorithms used for classification of 
crops. Since the SSDIAM does produce a significant increase in compression, the 
effects of such distortion on the results of various ERTS investigations should 
be evaluated. 

The compressed data is far more vulnerable to the effects of channel 
errors due to the removal of the redundancy present in the PCM spacecraft 
data. Such errors can occur over the transmission link from the satellite to 
the ground receiver or by dropouts on the magnetic tape storing the data. The 
effects of bit errors propogate differently depending on the compression 
algorithm used. For the SSDI/Rice algorithm used in this study, a bit error 
produces incorrect reconstructed pixel intensities until the next memory update. 
This is true whether a single bit was changed or a burst of noise affected a 
number of bits. Since the present simulation updates the memory only at the 
beginning of each scan line, the number of pixels corrupted depends upon the 
point in the scan line when the first bit error occurred. 

Photographs PI 7 through P24 shows the effects of channel bit error rates 
of 10" 5 and 10" 6 on the reconstruction of the principal image. Errors occurring 
at rate 10"^ produce only minor effects on the data while errors occurring at 
rate 10“ 5 can be seen. Since transmission channels and tape recorders used for 
the ERTS program have less than a 10" 5 bit error rate, the effects of such 
errors should not be a limitation on the use of data compression. Moreover, 
the propagation of such errors can be further limited by the insertion of 
several memory updates on each scan line. The overhead bits required for 
four or five memory updates per scan line would only increase the compressed 
bit rate by a fraction of one percent. 


4-6 



4.1.3 Implementation of the Compression Algorithms 


Section 3 contains a mechanization of the SSDI/Rice algorithm and 
a preliminary sizing of the hardware parameters associated with this system. 

This combination of the SSDI and Rice techniques appears to be a viable form 
of data compression for use aboard a spacecraft for several reasons: 

• The SSDI is the simplest algorithm to implement and is less 
affected by sensor anomalies than the SSDI technique. 

• The hardware required for the Rice implementation is moderately 
complex but capable of operation at high bit rates with current 
technology. 

• Channel errors produce distortion in the SSDI/Rice reconstructed 
data that are constrained to a segment of a single scan line. 

Use of either the SSDIA or adaptive Huffman techniques require 
the use of a higher quality channel since errors can propogate 
over several scan lines. 

• The Rice algorithm has the adaptivity required to produce a high 
degree of compression for data where the source statistics vary 
considerably over the scene. 

If the SSDIA/Rice technique is implemented as an alternative, the 
complexity required in the processor would increase slightly but an increase 
in storage would result due to the averages which must be computed using 
intensities from the previous scan line. In general, an entire scan line of 
data must be stored in each spectral band to accomplish the averaging. A 
shift register form of storage would permit the rapid access of the required 
intensity samples from the previous line. As each new intensity is recon- 
structed on the current line it is shifted into the register, shifting out 
the third intensity from the previous line required for the current averaging 
operation. The averaging of the four samples is easily accomplished by adding 
the appropriate four intensities and shifting the result two places for the 
divide operation. 

The storage of a full scan line of intensities in each spectral band 
implies a storage of almost 90 kilobits, a value which may not be excessive 
in the near future as CCD storage becomes practical. The required storage could 


4-7 



be halved by use of a modified SSDIA algorithm in which each two successive 
reconstructed intensities are averaged and stored as a single number to be 
used for SSDIA averaging when accessed. 

Implementation of the adaptive Huffman algorithm was not studied. The 
most difficult portion of this algorithm to implement is the hardware required 
to rapidly convert the probability distribution of symbols measured on one 
scan line into a Huffman code for use on symbols in the next scan line. This 
could be done off line by the following sequence of operations: 

• Measure symbol probabilities for scan line i 

• Generate and store symbols for scan line i+1 and in parallel 
generate the Huffman code for line i+1 simultaneously 

• Encode symbols from scan line i+1 during scan line i+2 

This technique requires the storage of two scan lines of symbols 
at a time but the encoding delay permits each new Huffman code to be generated 
over the time required to scan a line of new data. A suboptimal alternative 
would involve the storage of several fixed Huffman codes on the spacecraft 
and, at the end of each scan line, select that fixed code which best matches 
the measured symbol statistics for use in encoding the following scan line 
symbols. 

The added complexity of the SSDIM or SSDIAM hardware over that of its 
strictly information preserving counterpart is minimal. The block averaging 
required involves simple add operations and, if the number of pixels is a 
power of two (2 k ), k right shifts of the sum. Each intensity sample is then 
varied up to m levels closer to this block mean before the SSDI or SSDIA 
operations are executed. 

For ground based data compression implemented by computer, a different 
set of requirements is imposed. The compression should be relatively fast to 
prevent taking up an excessive amount of computer time beyond that already 
required for processing the ERTS imagery received. The compression achieved 
should be sufficient so that the resulting economic benefits offset the 
additional processing. In addition, reconstruction algorithms should be 
efficient and capable of being performed by the user of the data in most 
cases. The SSDI/Huffman algorithm has merit in such applications and a 
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preliminary approach to implementing such a ground based compression and 
reconstruction technique is outlined in Appendix D. 

4.1.4 Relative Algorithm Performance 

While no decisive statements can be made concerning the absolute 
performance of each algorithm because of the finite amount of data processed 
and the presence of anomalous data varying from scene to scene, certain 
general conclusions can be drawn. First, the adaptive Huffman and Rice 
algorithms invariably give a lower bit rate than the global Huffman algorithm 
based on the same set of symbols. This result is expected since global symbol 
statistics are rarely optimal for local areas within the scene (unless the scene 
contains a very uniform object class). The use of an adaptive coding tech- 
nique is mandatory aboard a spacecraft where all object classes are observed. 

Due to implementation considerations, the Rice algorithm appears to be a viable 
since it is capable of operating at high data rates and requires no computation 
or storage of source symbols beyond those in the block being processed. As 
shown in Appendix D, the global Huffman code has advantages for the ground 
compression of individual 100 x 100 nmi frames of ERTS data for which the 
global statistics are sufficient to obtain a significant compaction of data 
allowing the savings of several tapes per frame. 

Since the performance of the SHELL algorithm is comparable to that of 
the SSDI and the implementation of the SSDI algorithm is less complex, the 
SHELL technique has little merit for further consideration. Of the two strictly 
information preserving algorithms, SSDI and SSDIA, results are mixed. The 
SSDI provides better compression if the correlation of the data is significantly 
higher along the scan line than from one scan line to the next. For this reason, 
the presence of anomalous data, as in spectral band 2, produces a higher bit rate 
with the SSDIA than occurs with the SSDI. Although the implementation of the 
SSDIA is not significantly more complex than that of the SSDI, the SSDIA re- 
quires additional storage. The propagation of errors in the SSDI algorithm 
is constrained to a segment of a scan line between successive memory update 
points but errors in one scan line of data reconstructed by the SSDIA also effect 
reconstructed values in following scan lines due to the averaging operation 
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performed. This imposes a requirement for a higher quality transmission 
channel for the SSDIA than needed for the SSDI to give a similar quality of 
reconstructed data. 


The essentially information preserving SSDIM or SSDIAM algorithms 
yield significantly lower compressed bit rates than the strictly information 
preserving algorithms. The additional processing required for the mapping 
operation is negligible and the main consideration is the impact of the dis- 
tortion on users of ERTS data. From the preliminary results achieved during 
this investigation, it appears that an improved adaptive form of this algorithm 
could have potential application in the ERTS program, especially for data 
compressed and stored on the ground.. 


Although only one set of tapes of spacecraft data was processed, 
results indicate that the algorithms give a comparable compression, for space- 
craft data as for ground processed data and the algorithm performances maintain 
the same order of ranking. The scene processed produced a somewhat lower 
compressed bit rate for spacecraft data than for ground processed data. It 
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result. 


4.2 IMPACT OF DATA COMPRESSION ON THE ERTS PROGRAM 
4.2.1 IhfiJteflui rement for Data Compression in Future ERTS Missions 

The multispectral imaging sensors of ERTS-A generate tens of billions of 
bits daily. In succeeding missions, this figure will likely multiply by 
several orders of magnitude as higher resolution sensors and more spectral 
bands are added. Such volumes of data and the implied data rates present 
severe problems In communication links, In ground data processing, and in 
ground data storage and archiving. The actual feasibility of including an 
experiment may therefore be threatened by the large data rate and the accom- 
panying data handling and communication overload. In addition, several digital 
tapes must be provided each investigator for every scene he requests. 

Efficient source coding, i.e., data compression, can yield significant 
benefits and alleviate such problems by exploiting redundancies In the data to 
reduce the amount of data which must be transmitted, processed, and stored. The 
objective of source encoding is the exploitation of the statistical dependence 
between data samples so that only that information which is essential to 
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the faithful reproduction of the image need be transmitted. 

The success of the earth resources survey programs will be measured 
to a large extent by the satisfaction provided users of the data. That 
satisfaction depends on how well the data, as formatted and processed, helps 
the individual accomplish tasks that are significant relative to his goals. 

Any processing performed on the data should be accomplished without sacrificing 
the information fidelity required by the user. 

Many forms of data compression have been studied, some simple and 
others quite complex, but not all of these techniques preserve information 
in the sense that the original data can be reconstructed with arbitrarily 
small error. A strictly information preserving data compression algorithm 
provides reconstructed data identical to the digital sensor data entering the 
compressor. Such a technique preserves the archived data and cannot be 
criticized by any user as invalidating his data requirements. On the other 
hand, strictly information preserving techniques are limited in the amount of 
compression available. 

Essentially information preserving compression permits a much higher 
compression to be obtained but permits a small amount of distortion in the 
reconstructed data. While an average compression much greater than 2:1 is 
difficult to achieve with strictly information preserving techniques, essen- 
tially information preserving algorithms can yield significantly higher com- 
pressions with only a modest degree of distortion. In many cases, if the com- 
pression is properly performed, much of the distortion actually arises from 
elimination of sensor noise rather than by the destruction of useful data. Such 
a form of essentially information preserving compression yields data which can 
be used by many of the scientific investigators, and the reconstructed image 
simply suffers an apparent slight decrease in signal-to-noise ratio which can 
possibly be improved by postprocessing techniques. 

The requirements for data compression are different, depending on 
whether the initial compression is performed on-board the satellite or on 
the ground. For on-board processing, size, power, weight, and complexity must 
be minimized; a more complex reconstruction may be used, however, since recon- 
struction is usually performed by a ground-based computer. Ground-based 
compression may be considerably more complex since it would be performed by a 
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large computer which need not necessarily operate in real time. Reconstruction, 
in this case, usually is performed by the user, and an efficient rapid recon- 
struction is mandatory to reduce the user's cost. There is a definite need 
for more than one form of data compression. 

4.2.2 Application of this Investigation to ERTS Prnoram 

This study concentrated on the low complexity SSDI algorithms combined 
with source encoding as a technique applicable to either the spacecraft or 
ground-based data compression of ERTS digitized imagery. The results are 
quite encouraging regarding the suitability of these techniques for applications 
in the ERTS program. 

The SSDI/Huffman algorithm forms the basis of an efficient technique 
for ground-based data compression and reconstruction that can be performed 
with a modest amount of computer processing. Results indicate that the 
average 100 x 100 n mi scene can be compressed to a single reel of magnetic 
tape and reconstructed with no loss of information. This reduction in storage 
from four reels of tape to a single reel yields economic benefits through a 
reduction in tape costs and in the tape storage facilities required while 
permitting a simplified archival procedure to be employed. 

The essentially information preserving SSDIM or SSDIAM algorithms can 
yield reconstructed data which has a fidelity acceptable for many users of 
ERTS data and allows an even greater compaction of data. An improved adaptive 
form of this algorithm may allow storage of two 100 x 100 n mi scenes on a 
single reel of tape. The distortion induced at present in the data by the 
ground decompression algorithms is comparable in degree to that produced by 
the SSDIM with |m| =1. Proposed techniques for the geometric correction of 
ERTS imagery involve prediction on interpolation of corrected data intensities, 
a process which is also essentially information preserving. Instead of cas- 
cading the operations of decompression, geometric correction, and data compressi 
the three techniques could be combined into a composite algorithm which would 
increase processing efficiency while minimizing the data stored. 

The SSDI /Rice technique is a viable candidate for spacecraft applications 
permitting a doubling of the data transmitted to ground. The reconstructed data 
is of high quality provided the channel bit error rate is 10" 6 or less. To 
further decrease the effects of channel errors and allow the use of downlinks 
having a higher bit error rate, error correction coding can be applied to the 
compressed data. 
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In conclusion, the results of this study indicate that the use of 
data compression on ERTS data yields both economic and operational benefits. 
The tradeoff between strictly and essentially information preserving forms 
of the algorithms depends on the effect of slight distortion on the various 
uses of ERTS data. 



5. RECOMMENDATIONS FOR FUTURE WORK 


The present study has investigated both the degree and variation in 
compression for the SSDI algorithms using ground processed ERTS-1 tapes 
(and one spacecraft data tape). The results obtained in this study are 
relevant to the objectives set forth in the proposal but as system require- 
ments change in the future it is recommended that similar studies be 
performed. As future scanners are developed with different resolution, 
different quantizer, or a different set of spectral bands, the expected 
level of compression would change. The effects of these different para- 
meters are not completely clear at present but compression should increase 
as the average spectral band separation is narrowed and as system noise 
diminishes. Changes in ground processing algorithms can also affect the 
level of compression. Such system changes should be accompanied by a 
corresponding re-processing of the data with the same or appropriately 
modified compression algorithms to ascertain the compression and data 
statistics that result. It is probably sufficient to re-run only a few 
selected subscenes for purposes of comparison rather than a study of the 
present scope. 

While one spacecraft data tape was processed and compared to the 
equivalent ground processed tape during the current study, more conclusive 
results would be obtained by processing a number of these tapes. Such an 
investigation would yield a body of data of greater relevance to the use 
of data compression aboard a spacecraft than is provided by the present 
study. It is recommended that such a study be limited to simulation of 
the SSDI and SSDIA algorithms followed by Rice encoding since these appear 
to be the better candidates for spacecraft applications. 

A necessary precursor to the use of data compression aboard 
spacecraft is a more extensive study of the hardware implementation involved. 
The preliminary study of an implementation of the SSDI/Rice system, given 
in Section 3, should be refined and extended to a breadboard model. Such 
a study would develop a more comprehensive understanding of hardware per- 
formance and tradeoffs and permit a detailed comparison of actual with 
simulated performance, especially in the area of buffer parameters. 
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Another area in which additional work is required is that of 
efficient ground-based compression and reconstruction of ERTS data. The 
computer programs used in the present study were developed for the CDC- 
6500 digital computer with the goal of obtaining a large number of statis- 
tical parameters characterizing the data and algorithm performance. The 
scope of the current investigation neither required nor permitted an opti- 
mization of these algorithms on the basis of ground processing efficiency. 

Based on the results obtained and the attendant benefits to archiving and 
tape transmittal, an investigation into the form such an operational system 
should have appears to be a logical extension of the current study. The 
appendix describes some preliminary thoughts on a technique, based on the 
SSDI/Huffman compression algorithm, which permits a rapid processing of 
ERTS imagery applicable to both large scale computer systems and modest 
minicomputers. 

Although the current investigation concentrated on strictly 
information preserving techniques, the results obtained with the essentially 
information preserving SSDIAM algorithm indicate that a slight level of 
distortion, properly performed, may yield reconstructed imagery acceptable 
to many users of ERTS data. Such an investigation would determine the level 
and type of distortion incurred by algorithms such as the SSDIAM and the block- 
interpolation SSDI and the effects of such processed data on various investi- 
gations including both photographic interpreters and computer-aided inter- 
pretations of the data. Such a study could lead to an optimized algorithm 
which would not compromise the intended use of the data anymore than such 
standard processing techniques as geometric correction, while yielding 
substantially higher compression than is possible with strictly information 
preserving techniques. 
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APPENDIX A 


PHOTOS 






4 Spectral Bands, Processed as Scene 31 
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4 Spectral Bands, Processed as Scene 32 
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4 Spectral Bands, Processed as Scene 33 



4 Spectral Bands Processed as Scene 34 



Original Data (Scene 31), Band 2 
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Original Data (Scene 31), Band 3 



Original Data (Scene 31), Band 4 


Essentially Information Preserving Data, |m|=l (Scene 31), Band 1 



Essentially Information Preserving Data,|m|=l (Scene 31), Band 2 
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Essentially Information Preserving Data, m =1 (Scene 31), Band 3 
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Essentially Information Preserving Data, /m|=l (Scene 31), Band 4 
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Essentially Information Preserving Data, lm|=3 (Scene 31), Band 1 
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Essentially Information Preserving Data, [ml=3 (Scene 31), Band 2 
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Essentially Information Preserving Data, |mJ=3 (Scene 31), Band 3 



Essentially Information Preserving Data, m =3 (Scene 31), Band 4 
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APPENDIX B: COMPUTER OUTPUT 


This section contains the set of computer print output resulting 
from the operation of programs DCSTAT1 and DC5TAT2 on the Principal Full 
Scene (number 31). 
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4 

1100 


. 096 

3 

on 

-i 

. C 63 

4 

loi i 

0 

. 337 

2 

00 

1 

.on 

4 

1010 

* 2 

. C 97 

3 

oiu 

3 

. 059 

4 

100 1 

4 

.C*G 

5 

iioio 

5 

• 028 

5 

1 L 1 0 1 

6 

. 016 

6 

niioo . 

7 

• COO 

7 

Ill 1101 

3 

. 008 

7 

111 1100 

9 

• CG8 

4 * 

1000 

10 

. 00 1 

it * 

1030 

11 

• 002 

4 * 

100 0 

12 

• CO? 

4 s .- 

10 00 

1 ? 

• 001 

4 * 

1000 

1^ 

.001 

4 * 

100 0 

18 

.001 

4 * 

100 0 




HllERMN CODES F OP SSDM 


. AVERAGE 

COOF 

LENGTH 3. 

*96 FNTROPY 3.50* 

LSVSl 

Pf C». 

LENGTH 


-?0 

.000 

4* 

100 0 

-19 

. Cuj 

4 * 

1 ooo 

-IB 

. COO 

4* 

100 0 

-17 

. 000 

4* 

1000 

-16 

• 000 

4* 

1000 

-15 

• 001 

4* 

100 0 

-14 

.001 

4* 

1 UOQ 

-13 

• 001 

4* 

i oo o 

-1? 

• 002 

4 * 

lOUO 

-11 

• 002 

4 * 

1090 

-10 , 

. CO 3 

4* 

1 90 0 

-9 

.004 

4* 

1000 

. -8 

. 006 

4* 

100 0 

-7 

. CG9 

6 

111111 


. 0 1 4 

6- 

111110 

- 5 

.020 

5 

110 11 

-4 

. 032 

5 

110 10 

-3 

.C54 

4 

1 10 0 

' -? 

.081 

4 

101 1 

>i 

.m 

3 

010 

0 

. 50 % 

2 

00 

i 

. 119 

3 

Oil 

L. 

• 073 

4 

iO 1 0 ■“ 


• CM 

4 

1 GO 1 

4 

.031 

5 

11101 

£ 

• O’- 0 

5 

ill oo 

6 

.014 

6 

Ml 101 

7 


6 

1 111 00 

3 

• COh 

4* 

i oo o 

Q 

• 004 

4 r i £ 

1 (jO 0 

10 

• C03 

4* 

laoo 

11 

• 002 

4* 

1 000 

1 2 

• G02 

4* 

i OoO 

13 

.001 

4* 

1U0 0 

14 

.001 

4* 

Touo 

13 

• 001 

4* 

1090 



16 

.000 " 

‘ ■ 4* 

1000 

17 

• ooo 

4 * 

I 000 

18 

.ouo 

4* 

1000 

19 

*000 

4* 

1000 

?0 

*000 

4* 

1000 


*0300 PEP PROBABILITY - *043 CODE LENGTH = 4 


OP 

I 
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'tufR'AM Liimis rnp £SJlUil 


AV r R AGE 

CCD C 

LENGTH 3© 

066 ENTROPY 2.801 

L f V E L 

PRCB. 

LENGTH 


-20 

q COD 

4* 

1 100 

-19 

. 00 .) 

4* 

1 10 0 

-18 

© Cuo 

4* 

1100 

-17 

.000 

4 * 

1 10 0 

-16 

. GO 0 

4 v 

1100 

- 1 S 

• OuO 

4* 

1100 

"l* 

. 000 

4* 

1 100 

-13 

.000 


1100 

-IE 

. 000 

A* 

1100 

-11 

. coo 

4 * 

1 loo 

-10 

©oao 

4* 

110 0 

- 9 

.001 

4 * 

.1100 

-a 

.001 

4* 

1100 

-7 

.002 

4 * 

1100 

-6 

. 003 

4* 

1100 

-5 

.006 

4* 

1 100 


.012 

4* 

1100 

- 3 

.076 

5 

111 11 

-? 

. 0<r 0 

4 

1110 

-i 

. 1*4 

3 

“Ten ■ 

0 

• ? in 

2 

00 

1 

. 2^3 

2 

0 1 

? 

©C9A 

3 

iOu 

3 

. > 

4 

1101 

4 

• 1 17 

5 

lino 

5 

• ooa 

4 £ 

1 100 

6 

. 004 

4* 

1 100 

7 

©00? 

4* 

1 10 J 

A 

© 001 

4* 

1 100 

o 

* GO! 

4 $ 

1 100 

10 

*000 

4& 

1 10 0 

n 

© GOO 

4* 

1100 

i? 

© COO 

4* 

1100 

n 

©CO 0 

4* 

1 loo 

; 4 

. Guo 

4-:: 

1100 

1:> 

.00 ) 

4* 

1 100 
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S S 0 1 

rn''-pF 

’ > S I CN ST A 

T I ST If :s 



CAS4 

10 : " C : 

— : ■>£ ? 6 - 

SSD1 * 


OUTPUT BITS/PIXFL = 

3.C BIJF 1 , 

= 7 . 5 mjFi 


SCAN LINT 

HITS IN 

rufs 

HJF. 

ADAPT, HUf. 

0 icr 




CBITS/PIXI 

t aus/Pix) 

< bits/pix> 

1 

i 76 0 

0 

n **j *7 

- i ; 

7. 3S2 

3.43? 

2 

2 'y . 1 

p 3 c 

3 • 7 40 

3.4?^ 

7 e ? 7 A 

2 

FCf> ? 

4 r 

7 .375 

3 . 4 7 7 

7 a ^ 

4 

677 2 

r ^ 7 

2 * 2 7 0 

3.?e 7 

7 ~ ! A 

- 0 -• ^ 

5 

65 7 6 

7 -> g 

7 . 3 3 6 

3. 772 

3 0 3 6 ? 

6 

tc^oc 

9 70 

3 . .3 54 

3 . ^*7? 

3o77p 

♦ 

1 709 S 

1 C 66 

3. 7 0 c 

3.557 

;< - ? 7 

e 

? 3 7 1 7 

t 173 

? . ? ? * 

3* ? c 

TiS C ~ O 
□• . — . 

9 

1 c^c; 

?. CS 7 

3 • 3 6 

3. 3? 9 


to 

7 6 C ^T 

S 79 

3 • ? a 7 

5. ?f 3 

2.4-2 

IT 

1 E A 3 3 

cr: 

?.??7 

3. 5*0 

3.4 9? 

1? 

ZC?? C , 

! ^ 4 3 

3 • ? 4 2 

3 . 4 1 7 

3.559 

T 3 

?l 7 0* 

1 CCA 

7 .? 7 0 

?.“?3 

2 .467 

!A 

? 3 7 0 5 

SB 9 

7 • 7 74 

3. 277 

3*- . 4 ^ A 

1 S 

A ~ 

2466? 

7 V' 

3 • * o 7 

3.57A 

3 .A*? 

16 

? 6 3 7 6 

7q«5 

? . 7 ? 6 

* 7. * 6 6 

3.51? 

17 

Z7a,?i 

fcn 9 

3,302 

3.376 

2 .4 61 

t* 

? c *a o 

a P A 

?.*9Q 

7. AAO 

3 . 567 

j 9 

? 1 4 A y 

1 Cjp 3 

3.762 

3. 776 

3.5 6f 

20 

3 7 33 0 

l 3 A ; 

2.3 66 

3 . aqo 

7.=?H 

21 

7 6?76 

1676 

? .3 7Cj 

3. AO" 

r . 6 OA 

3 2 

?7?07 

l G 9 ? 

3.239 

3.414 

3 . c 99 

? 3 

?9U)? 

2 " >.: * 

? . 3 c -6 

3.4? •} 

: • r 90 

?4 

l c C a 5 

-i r^o 

3. 7 4Q 

3 . ? 6 A 

7 . t: .69 

2* 

4 ?C4 Q 

ITu* > 

? .2 7 1 

3 • * U ? 

3.67 9 

26 

A 4 9 [P 

T *> ? 

2. 7 9S 

? * £ A ^ 

2.624 

27 

471^1 

3 6 c Q 

2.4?? 

3. 4 3 7 

2.669 

28 

AgSFA 

4 G 70 

3. 7 96 

3.44? 

3*562 

? 9 

5 C5°E 


7 . ? ? 7 

7 .^tr 

2 0 6 O 7 

70 

E - 0/. A 

4772 

7 . 4 1 0 

?.*? c 

3.4 04 

1 7 

f /- ^ ^ } 

A c o x 

-i "5 A 

- . ^ . u 

7.3c? 

2 7q 

3 Z 

fj f : c t 7 

5 5 4 i : 

7 .4 7 * 

3 , 5 ! * 

3.6 7 ;. 

7 1 

5*20 p 

cr -4 1 

3 • 4 3 6 


3.497 

?4 

?' 6 ?■'- F 

A c q” 

7 .'*00 

5.506 

3 . 3 6? 

1 K 

6] 7 ? 7 

C C " !'.' 

*‘.aua 

3 , c. a 4 

" ... “» C 

. . 
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' 7 6 

64 7- c 


’.**0 

1 r ? 9 

w i ^ : j £ 

3.75^ 


37 

6 5 814 

64 22 

3.488 ' 

?.-7 7 

3 . 5 7 3 


3 9 

673?? 

6 31. 9 

? .4 6 ? 

3.563 

7 .468 


3 9 

6 6 334 

5 7 C8 


3.512 

3.311 


40 

7C4?4 

6 7 96 

2.393 

2.498 

3.651 


41 

72*0 3 

. 6 r 6 5 

?.?■$? 

3. *67 

3.645 


*2 . 

74 c 4* 

*7(~C /, 

1,^7 

3.47? 

3.6 32 


4 3 

7 7 C 8 l 

ac?7 

3.* r '* 

3.365 

3.788 


44 

7 574l 

9065 

? • 4 7 9 

3.539 

3.824 


45 

6 1 93 9 

•’C c 1 

3.973 

3.6M 

3.496 


4 6 

.■ 82944 

? 0C4* 

3.4«0 

3.769 

* .. . . 3.808 


47 . 

66393. 

1077! 

^ ^ 4/j 

' 3.540 

3.725 

:V: X ; 

48 

ser>o 

1 3 396 

3.4*0 

2.415 

3.6 94 •' 


49 

9C963 

17 2“ 7 

3*471 

3.521 

7.7«3 


C G 

9 3 3 6 ? 

1 3C1 3 

2.4 54 

3.5*6 

3.7-*4 


5 ! 

9550? 

! '35*5 

? • 4 I ^ 

3.*90 

3.665 


c ? 

: .51*66 

1 389-V . 

3.2 9 c ’ 

■; . 3.47C 

j .!■;■. 3.6C5 ' 


53 

59766 

14C8? 

2.713 ' 

3.2 66 

• 3.558 . 


54 

101008 

1 4 7 1 7 

."• 2.354 

3.291 

3.540 


5 5 

103652 

14 244 

3.299 

7.3*5 

3 . 5 ’ 0 


86 

1045*4 

14524 

3.356 

3.404 

2 . 5 87 


57 


1 4 7 O 7 

’ . 5 ?. 0 

3.2Q’ 

3.58? 


5 6 

lose- * 

14770 ■ 

. 2.343 ■•;•■ 

V • * ?c? 

*■■*;■■ 3.493 ' - 


5S 

!.C 8 Q ? 6 

. 15C5 5 -< 

. 2.362 :r 

. 3.377 

. 3.590 

■V : .?v^V:V' 

6 C 

1 11087 

14619 

3.4? 7 

3.559 

3.764 

■; ■ V:: : : ; 

61 

112926 

14 846 

3,4 39 

3. c 09 

2 . 477 - 


6? 

1 12540 

14 268 

2 .444 

3.5*8 

3. *14 


i 

1 1 c c*o 

* 3 7 ’ 6 

3.3”»5 

3.4?7- 

1 ^ A "5 t 


i < A 

1161.41 


2.35* 

3. 4? 3 

• = 3.3*2 


! 6* 

1 1 7953 

1 ? 4 3 r 

7 # 2 7 7 

7 . «4 9 3 

3 . 5 6 ? 


66 

12CC48 

S 3 909 

3 . < n 2 

3.* 49 

3 . 6 c 0 


A7 

1 P 2 3 6 9 

l'«6i 7 

?.4?3 

■ 3 . <84 

2.720 


69 

7 ?7f 7 7 

14 ^^ 3 

3.404 

3.501 

2.406 


6 9 

? 2 11 * 9 r > 

I 4 7 3 c 

3.? 7 7 

3.^41 

3.6’* 


7 C 

7 28045 

] *4*7 

3 *4 b 9 

3. 6*6 

3.729 


7: 

125207 

1*107 

3*441 

3.499 

? "pi 


7 2 

1 ? C 7 v.) 7 

1. 4 £90 

3.475 

3.50? 

2.433 


1 2 

1 “ ? Ov q 

14^- 

3.4 19 

3.477 

3 . * ZO 


7 4 

13<fl 9 

15 6^1 

3.4 57 

3. 5 PC 

2.7*4 


75 

1 - * -JJ7 

1 r,^c 

3 .“80 

3.64Q 

3. r ?4 


r 76 

1 f 7679 

15*19- 

3.470 

3. *70 

3.4 57 


! 7 ? 

i 

1 3 It p s 3 

14-301 

3.336 

3.492 

3.3 64 


1 7a 

1 4 C ? .? 9 

! *745 

3.453 

3.536 

3,477 


?q 

1*1 748 

1*652 

2.^99 

3.479 

3.471 
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PC 

14 ’S IC 

7 4 70 ? 

- *49^ 

3 . 6-7 

?.543 

El 

7 4r J33 

14713 

?^7U 

3. r °2 

? « 4 7 7 

62 

*66^6 

3 4 744 

2 .*5* 

3 . p y? 

3.508 

£3 

1 4P 7 0Q 

1 47 6 4 

3.476 

3.618 

3 .*06 

64 

1 *C0'i / 

14379 

3.456 

3*60’ 

3 • 5 3 16 

55 

16366? 

14694 

3 .460 

3.57C 

2.5 0* 

56 

1 r a r 7 j 

1 K 1 9 3 

3 ♦ 5 2 5 

3.666 

3.5 93 

E7 

* n'CC5 1 

j 5541 

■3 C 0 

2.669 

3.608 

pa 

16 7 37° 

1 r 77J 

2 m n *. 

3.708 

3.5 73 


1 CC O A 7 

1 6C4 7 

7 • c 1 5 

3.fcO? 

3. c £4 

?o 

161290 

J 6 46 .3 

? • r f; 0 

3.66 6 

3 .639 

91 

1 6 3 .3 1 7 

14777 

3 .505 

3.6^2 

3.596 

52 

’ 6 5 * 6 5 

1 71 I T 

7.C/.9 

3. 716 

3.606 

9 3 

. 1 f ft j(J Q 3 

1722 9 

2 • 6 ?"* 

3*^70 

4. 

54 

, 6665 7 

17 581 

? .?4fe 

3.694 

3.609 

55 

17(624 

17936 

3.542 

?.*96 

3.610 

<=6 

’72121 

17 621 

2.343 

3.671 

5.4 02 

57 

> 7 c ? ? g 

'em ■ 

3 # 497 

3.6?<f 

3.654 

56 

176265 

1,8 54 6 

3.5 89 

3.70? 

3 . * 33 

55 

1 77^41 

> 8605 

3*4 6?. 

3 ♦ ^ ? 1 

5.519 

100 

? 7C67^ 

18731 

3.AP? 

3^oo 

3. *39 

101 

181345 

1 8765 

3.4e? 

■3.576 

3 . 5 J 7 

'02 

• 182542 

18‘ j 7q 

3.480 

3.584 

3.495 ■■■ 

1 C 3 

164436 

1 8 7 C 3. 

3.467 

‘ 3.533 

3.479 

1 C* 

!66~37 

’ 0 <=<♦?. 

3 .488 

3.622 

7.574 

1 C 5 

188235 

1 q 5 ? 1 

3.488 

3.6*1 

3.687 

106 

■ 1 5 C 2 7 9 

1 6655 

2.543 

3.713 

3.6 26 

1C7 

1 92006 

I ^ 7^4 

8.5’ 6 

3.693 • 

3.1 7fc 

“tC9 " 

r^?504“ 

] <? <? A Q 

? .449 

m 6 7 

3.558 

1C5 


J?V7? 

2.518 

?.*72 

3.676 

1 ! C 


2 ? ? ? 4 

?.4 7 c 

7 . m 

2.513 

111 

roc6i~ 

2 1 9 3 6 

3 . 4 ? 7 

3* 


112 

202° 3 2 

22C40 

0 ^^2 

3 . 544 

3 .7-' 9 

1.13 

20? 775 

2 3 3 71 

2.m 

7 . f 67 

3* 7 ?7 

1 14 

20779 6 

7 4 j £ 0 

3 .444 

1 u c 1 

.«—■••• 

3 • 7 e ? 

* 1 5 

2 06 ?*« 

2*7? 1 

3.4’ c 

3 * p 4 4 

;< , *. M 7 

I It 

21 1 18 7 

7A44 ** 

3 ,4?-i 

3 • r ; 0 e 

? 0 5 70 

117 

2 ! 2 ' ; 0 5 

7^5!: i 

3.^0 

3.67 2 

3 a F l “ ? 

1 1 9 

? 3 c 72 ^ 

? ^ 1 6 1 

?.?03 

3.64C 

3 . 5 6 f 

1 ? 9 

?’ 6 76,? 

^ c 1 ^ c 


?. 594 

3.? 12 

1 2 0 

2 1 £ ■» ?. c 

p *» 1 r j 

2 % A /> » 

3 . r< ^0 

3.48? 

1 21 

?1«H27 

25037 

3 ,?R0 


3.465 
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17? 

7 2 C 6 :) 0 

24 315 

5 .-^! 

3. *8? 

3 .?cq 


i 2? 

7 ? I T 8 3 

23764 , 

? *441 

3.^98 

3.206 


124 

22 ? C 1 9 

52853 

2*396 

3.410 

2 .771 


12? 

72351 5 

71^7 

2 . 4 C 8 

3. 563 

3.3 09 


126 

.127 

^ii’a 

2237” 

1 \ 568 
20765. 

^901 
3 • ? 44 

3.513 

3.484 

3.2 83 
. 3.2*7 


12e 

.3 2 7 c " 4 

2 \ 4 7 .3 

7.4? 7 

3.574 

2 .700 


125 

229370 

21674 

?.?54 

3. <*31 

3.57 6 


130 

271?3fl 

21530 

3.346 

?*?47 

3. *79 


i 2 1 

?37 c ^ e 

■ 3 7 C 1 8 

3.329 

3.359 

3.577 


12? 


22!*:4 

? . 7 K 5 

3.375 

3. *54 3 

• ■ • -.*! 

1 77 
174 

' •" 276 525. 
? ? 6 6 7 1 

23 381. 
3 2 515 

• 3 *? 1 9- 
3 * ? 56 

• 3.291 
3.3 00 

2.5 70 
3.666 


1 35 

239B4Q 

23481 

3.357 

3.294 

3.365 


176 

?4 1C03 

2 2 G 3 ? 

3-378 

3.414 

3.336 


127 

?4223* 

21. f * ? 

3.360 

3.400 

3.383 


1 38 
179 

2426 62 
.245010 

21478 
2 1 194 

V-. 3.4 2 8 ; 

■ 2.394 ' ■■ 

3.588 
3. *15 

-t: 3.449 

3.41?. 


HC 

2^503 7 

20-50 6 

. ? . ? 1 2 

?.?!? 

3 . 2 c 6 


141 

?4£966 

19526 

3.340 

3.3 40 

3.330 


142 

?4 7 * 6 C 

’ 9C03 

3.771 

3.214 

3.215 


143 

2 <* S '*55 

1 85^1 

3.307 

7.256 

3.495 


144 

•■; ...■■ 25122 5 

19749 . 

■«;.v 3*2! 5 . v . v< 

2 3.325 : 

• /: •• 3.611 


145 

2.52162 

16 674 

3.797 ■" 

■: 3.293 

3.251 


146 

252551 

1 7 B * ! 

3.219 

3,280 

3.245 


147 

2 f ^*14 f 

19 034 

3.316 ’ 

3 • ? 1 2 

3. 5 5? 


145 

2 5i?i5* 7 

13 1 5 3 

3.326 

3. - Q 9 

3.540 


1 1 49 

0 C £ 1 7 O 

1919 3 

3.700 

3. ?4C 

?. c 00 


• 150 

2 59 00 5 

.18257 • 

3.330 

3 *?88 

3.5 70 


151 

132 

261 242 
2^2911 

1 H 1 fl 19 

1 8 1 ? 9 

3.‘5fc 

3,246 

3.260 
3. '67 

3.477 

3*467 


153 

7f 4 4 p 4 

1 6770 

3.70 1 

7.201 

7. 34} 


154 

? 6 p 7 7 6 

1^770 

3. 3 58 

3.240 

7.3^3 


1.55 

2*7259 

?. 7 ^ 5 1 

?.?f5 

3.7-6. 

3 .4 75 


156 

.265100 

17 380 

3.338 

3.354 

3.571 


I 57 

27C176 

1 ’854 

3*395 

3 . 2 ? 9 

3.504 


15a 


1 7664 

?.^90 

3.262 

3.4 9! 


15 9 

? 7 7 c i ft 


3.337 

3*215 

3. *06 


1 6C 

576605 


2.30 8 

3.214 

3.487 


161 

2 7 7116 

1 7 p a 4 

3.’ 22 

3.705 

3.498 


16? 

27855*0 

] *K^B 

3.**0 

^ no 

. c. 

* C Tit 

,.■ • 


A 3 

? 8 C C 3 5 
?81 1 ’0 

? 7 57 1 
1 7 c ! 4 

■ - -r 

3.3 79 

a oil 
. - * O ■ 

2.43? 

3.740 
■>.3 40 



r 7 P2 3? 2 

16584 

3-357 

3.439 

3.3 57 
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2C8 

acian 

: y r . ^ ,-. . 

16 6^5 

• -» 

2.206 

^ :r 

3.2 3* 

— 1*JLLU 

3.579 

210 

2 5 506 e 

1 6 SCO 

2.27* 

3.361 

3.533 

211 

35661 6 

7 6 7 3 6 

2 .290 

3.308 

3.480 

212"?'^ 

■S&35F223'*; 

16731 

y ? y 3.292;: y-y-..- 

3.27? 

■Yv • 3.488 

2 1 3 

'IRCQI 1 

. %• -- * 

167C0 

3 *3 ^9 

3.295 

3.492 

21* 

261207 

16 591 

2.25V 

3.2*1 

3. *6* 

21? 

362862 

1 c 6 ? * 

7 .2^9 

3.257 

3.4*79 

216' 

26*66? 

167?? 

3.335 

3.361 

3.561 

217 

266299 

16737 

3 - 7 97 

3.290 

3.505 

218 

3681 62 

1 6953 

■ ' 3 .310 5^ ;; ■ 

3.374 

' .3,581 . ',Y':?Y' Y 

2!<? 

37 C 16 2 

17236 

: ■' 3.3?3« 

3,446 

3.6 20 

220 

^ ? 1 q 7 g 

1 7 *91 
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APPENDIX C 

ALGORITHM FOR UNPACKING AND DECODING 


The algorithm shown in Figure C-l unpacks and decodes the packed, 
encoded data samples. Reading and writing of data blocks is double 
buffered and overlapped with processing. A table look-up operation is 
performed to decode compressed picture information and obtain an index 
for unpacking. The compressed picture information is further processed 
to reconstruct each picture. 

The algorithm details the time critical operations of unpacking 
and decoding. The following discussion explains how unpacking and de- 
coding can be performed on a minicomputer to minimize time and cost. 

For purposes of discussion, data is read from one of two blocks (j 
pointer) for processing and written to one of two blocks (k pointer). 
The four data blocks reside in the main memory to buffer information 
between the processor and input/output devices. 

Eight registers contain the immediate information required for 
processing successive data samples. The following table describes the 
the role of each register. The word length of the machine is denoted 
by n. 


R1 accumulator. 

R2 s, shift index for unpacking. 

R3 a) a register pair to contain from n to 2n 
R4 bt bits of packed, encoded data samples. 

The high order bit can be shifted left out 
of b into the low order bit position in a. 
R5 d, the number of left -justified data bits 
remaining in b. 

R6 i, the index into block j for moving data to 
register b. 

R7 j, input data block pointer. 

R8 k, output data block pointer. 
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At START, data is read into the input buffer, pointers are 
initialized, and the first word is copied into a from the buffer. 

The N=12 high order bits of a are used as an index to a table to 
obtain the data value (v) and length (s) of the first code word. 

The value v is used in picture reconstruction according to the inverse 
SSDI algorithm as previously discussed. The length of the code word, s, 
is used in the following sequence of steps to unpack the next code 
word. The parameter s represents the number of bits that the 
register pair a-b must be shifted left. Before such shifting, 
however, it is necessary to determine when another word of packed, 
encoded data samples must be loaded into the b register. This 
question is resolved by comparing s and d (the number of data bits 
in b). In the general case s is less than d, so d is decremented, 
shifting is performed, and the algorithm returns to a to repeat 
the table look-up for the 12 high-order bits of a. 

If s equals d, a is reset to the machine word length, a-b is 
shifted left s bits, and the next word from the input buffer is 
loaded into b. Before returning to a, the index i is incremented 
to point to the next word in the input buffer or another block is 
read if the current input buffer is empty. 

If s is greater than d (initially d = 0) a-b is shifted until 
b is empty, b is loaded with another word, and a-b is shifted left 
the remaining s-d bits. The index i is incremented or another block 
is read before returning to a. 

The critical path in this algorithm is the case where s is less 
than d, because the average code word length is less than one-third 
the machine word length. This path is shorter than the other two 
because no indexing or reading is required to service the b register. 

The next most frequently used path corresponds to the case of s 
greater than d. In all cases the alignment of code words in a is 
expedited by using the registers for immediate processing instead of 
accessing memory. 
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APPENDIX D 

EFFICIENT GROUND-BASED DATA COMPRESSION OF IMAGES 


The volume of data gathered by earth observation satellites such as 
the Earth Resources Technology Satellites (ERTS) produce major data handling 
problems both in ground data archiving and in the transmission of the data to 
the various investigators. Since each ground scene (100 nmi x 100 nmi) 
occupies four computer tapes, tape cost and storage facilities for the data 
produced became significant. The use of ground based data compression can 
diminish the magnitude of these problems. 

To be acceptable for such processing, several requirements must be met 
by the compression algorithm. Of primary importance, the data compression 
technique must permit reconstruction of the data with archival fidelity, i.e., 
no error can be introduced into the data. The strictly information preserving 
SSDI and SSDIA algorithms meet this requirement. 

A second constraint on the processing technique requires a moderately 
simple algorithm for compressing the ERTS data received on ground and a 
very efficient reconstruction algorithm. The processing time required for 
the initial compression which is performed only once can be substantially 
longer than the reconstruction time since the reconstruction of a given scene 
may be performed many times by different investigators. 

The reconstruction will also be performed by different user processing 
facilities ranging from modest minicomputers to large computer systems. The 
algorithm should be applicable to virtually any computer, setting limits on 
storage requirements, word size, and the instruction set used. 

A final requirement is that the compression achieved be sufficient 

to justify the added processing time required for the reconstruction. A 
compression of 2:1 would permit a saving of three magnetic tapes per scene. 
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The SSDI transform combined with Huffmann encoding form the basis of a 
ground-based data compression system which can fulfill the above requirements. 
The SSDI is a very efficient algorithm, requiring only 7/4 subtracts/sample 
on the average and the inverse SSDI requires the same number of adds for 
reconstruction. This rapid encoding and reconstruction combined with a table 
look-up Huffman encoding and decoding of the data yield an algorithm which 
requires a minimal number of machine operations for processing the data while 
giving an information preserving data compression comparable to that achieved 
by more sophisticated algorithms. Huffman coding can be performed based on 
the statistics of the entire scene or adaptively, based on the statistics of 
blocks of data. Although locally adaptive Huffman coding gives a higher 
compression than coding based on global statistics of the entire scene, 
adaptive coding entails an increased complexity when encoding and decoding 
the data. Global Huffman coding is quite attractive for compression of 
ERTS data since the entire decoding operation is based on a single code. 

The SSDIA algorithm permits a slightly higher degree of compression at the 
expense of increased storage and processing time. To perform the averaging 
operation, an entire scan line must be stored in each spectral band and an 
averaging operation performed for each input intensity. This additional 
load is negligible for the large computer used for encoding the data but 
could be a burden for reconstruction, especially if a small minicomputer 
is used with limited storage capabilities. 

A second problem arises if SSDIA is used. Since the reconstruction 
algorithm of the SSDIA data presupposes information regarding the intensities 
of the pixels in the previous scan line, decoding must proceed from the first 
scan line of the image. If a user wanted to reconstruct only a portion of 
the data for his investigation, the SSDIA forces him to initially reconstruct 
all four tapes before he can select the area of interest. SSDI permits 
the operator to start at any scan line he desires and reconstruct only the 
segment of the tape he requires. 


D-2 



D.l SSDI/HUFFMAN ENCODING OF MSS TAPES 

Generation of the Huffman coded tapes from the MSS tapes would be 
performed once for each ERTS image received. The encoding operation should 
be efficiently performed with a minimum of operator interaction and the encoded 
data must be in a form that allows rapid decoding at user facilities. A 
technique will be described that meets these requirements while halving 
the number of tapes required for archiving and transmittal to the users 
of the data. Figure D-l presents the basic data flow required for encoding 
the MSS tapes. The original four MSS tapes describing the scene are 
loaded onto the computer which subsequently generates the sequence of 
SSDI symbols. These symbols and auxiliary overhead data are written on 
intermediate tapes. Concurrent with the generation of the SSDI symbols, 
the probability of occurrence of each symbol is computed and stored in an 
array PSSDI. This array is of length 256 to include all possible SSDI 
levels which might occur. 



Figure D-l. Flow Chart for Encoding Data 

i 
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After the input tapes have been read and the intermediate tapes generated, 


the distribution of SSDI symbols for the entire scene is used to generate the 
Huffman code to be used. The subroutine CODE generates the Huffman code 
desired and generates a table HUF of length 256. Each entry of table HUF 
contains the Huffman code associated with a particular SSDI symbol. Sub- 
routine DECODE generates a table D which will be used by the decoding algor- 
ithm to enable table-look-up reconstruction of the data at the users processing 
facil i ty. 

The decoding table D is initially written as the first file on the output 
tape CMSS1. Following this, the Huffman encoded bit stream is written on the 
remainder of CMSS1 and CMSS2. The intermediate SSDI tapes and the table HUF 
are used concurrently to generate this compressed data. With each SSDI symbol 
read from the intermediate tape, an entry in array HUF is called where the 
entry is called with an argument corresponding to the SSDI symbol read. This 
array call returns the binary Huffman code associated with the symbol and that 
code word is written onto CMSS1 or C11SS2 . The code words are allowed to over- 
lap computer words in order to obtain a maximum compaction of the data. 

To simplify reconstruction, several constraints are imposed on the code 

words assigned to SSDI symbols. First, no code word is allowed to have more 

than N bits. The choice of N will be discussed later. To permit this, low 

probability symbols are assigned a Huffman coded prefix C L of at most N-8 

bits. This prefix is then followed by true eight bits to allow separation 

* 

of the symbols lumped under this prefix. An efficient version of the Huffman 
coding algorithm has been developed and validated at TRW, as described 
in section 2.2. 


This technique preserves the instantaneous property of the code. 
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At the beginning of a scan line, the first picture element in each 
spectral band is transmitted as a direct 7 bit value. Occurrence of the 
beginning of a scan line is designated in the coded bit stream by C L 
followed by eight ones. The next twenty-eight bits are the intensities of the 
first elements. All other picture elements in the scan line are encoded by 
the SSDI . 

N 

The table generated by subroutine DECODE contains an array of 2 entries. 
The address of each entry corresponds to a sequence of N bits and the entry 
itself contains two quantities of information required by the reconstruction 
algorithm. One quantity is the first decodable symbol in that string of bits. 
Since the number of bits in an allowable Huffman code word can vary from one to 
N bits, each table entry is guaranteed to contain a decoded symbol. Although 
on the average several decodable symbols are contained in N bits, it appears 
that one decode per table entry permits a more economical reconstruction 
algorithm. The second quantity contained in a table entry is the number of 
shifts required to position the next decodable word so that it will begin 
in the first bit position of the next N bit string. 

As an example, suppose that the first decodable word in the N bit 

sequence was of length K bits. The table entry addressed by these N 

bits would return SSDI symbol S^, corresponding to codeword C^, and the 

number of shifts, K, required to position the next codeword as the header 

★ 

of the subsequent N bit sequence. 

The SSDI algorithm is very fast requiring seven subtract operations per 
set of four input intensity samples. The time required for these operations 
and the associated fetches from local storage is less than the time required 
for reading in the four input MSS tapes, implying that this stage of pro- 
cessing is limited by tape speed. The required calls to subroutines CODE and 


if 

This encoding operation takes into account the operations which must be 
performed by the decoding algorithm. 
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DECODE required a total time of about three seconds. Generation of the 
Huffaan coded bit stream using the table look up technique is limited by the 
time required to read in the intermediate SSDI tapes and write tapes CMSS1 
and CMSS2. Thus, the entire encoding operation is essentially the same as 
the tape handling time. 

The storage required for encoding tapes is minimal and is easily accom- 
modated by any large computer. The generation of the SSDI symbols and the 
intermediate tapes requires the storage of eight input samples and four 
differences. The probability array PSSDI is of length 256. Array HUF is 
of length 256. Array D is of length 2 N , where N may typically equal 12. 

Each entry of D contains 8 bits giving the symbol decoded, followed by 4 
bits giving the number of shifts required. The Huffman coding routine requires 
an additional array storage of length 1024. 

D.2 DECODING AND RECONSTRUCTION PROCEDURE 

The primary requirements imposed on the decoding algorithm are: 

• Rapid reconstruction of the digital data 

• Required storage within the limitations of modest minicomputer 
sys terns 

• All computations and arrays limited to the single precision 
word length of the machines used 

• Minimal operator interaction required. 

These constraints will ensure that the algorithm selected can be implemented 
at all user installations. The goal of the algorithm development is a re- 
construction technique that takes about the same time for reconstructing 
the compressed tapes as required for reading the original MSS tapes and 
which would require negligible additional effort by the computer operator. 

Since there are over 27 million intensity samples per 100 nmi x 100 nmi 
scene, a very fast algorithm is necessary to minimize the number of machine 
operations required to decode each sample. Knowledge of the computer 
structure is essential so that an optimum flow can be established and an . 
efficient division made between input/output operations and central processing. 
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The time of critical machine operations that are performed often must be 
minimized, perhaps at the expense of lengthening operations that are 
performed infrequently. One example of this involves the minimization of 
normal bit decoding at the expense of increasing the time for restoring pixel 
intensities at the beginning of a scan line since the latter procedure occurs 
infrequently. 

A block diagram of the essential reconstruction flow is given in Figure P-2. 
A more detailed discussion of the machine operations involved in decoding is 
given in Appendix C . 



Figure D-2. Flow of the Decoding 
Algorithm 
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The input tapes contain the encoded bit stream in which no Huffman code- 
word contains more than N bits. A sequence of M > N bits is extracted at a 
time from the input tapes and stored in an input register. The value of M 
depends on the word length of the computer used and the number of registers 
available. Parameter M should be as large as possible in order to minimize 
the number of calls to the Input tape which consume valuable processing time. 

A sequence of N bits is extracted from the input register by appropriate 
masking operations. Decoding Table D, which has been read from the first file 
on the input tapes and put into local storage, is used to determine the first 
symbol contained in the N bit sequence. The N bit sequence is used as the 
address to access the proper table entry. This entry returns 12 bits of 
decoding information. The first 8 bits represent the first symbol V.. 
in the sequence. The next 4 bits represent the number of shifts required 
to position the next N bit sequence for decoding. If symbol has been 
coded with L i bits, then S- = £ N. 

Symbol V.. is used to reconstruct the next intensity by the inverse SSDI 
algorithm. Registers can be used to store the four previously decoded 
intensities I_. The four SSDI differentials, packed consecutively on the 
input tapes, are added in the appropriate sequence to the previously reconstructed 
intensities to form the current intensities which then replace the previous 
values in the registers. The inverse SSDI reconstruction takes seven addi- 
tional operations per set of four intensities. The reconstructed intensities 
are placed in a buffer for subsequent writing onto the four reconstructed 
MSS tapes. 

In preparation for the next decoding operation, S. bits are shifted out 
of the input buffer into the N bit register used to address Table D. The 
number of shifts S i must be compared to the number of bits left in the input 
buffer to determine if the buffer contains at least S- bits. If not, more 
data must be read into the buffer from the input tapes before the shifting 
occurs. 
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The normal algorithm flow, described above, occurs for all intensity 
values except for the set of four intensities at the beginning of a new scan 
line. A unique Huffman coded prefix is generated by the encoder at the beginning 
of each new scan line, and this prefix is followed by 28 bits which are the 
first four 7-bit intensity values in the scan line. Detection of this scan 
line prefix code by the decoding algorithm loads the following four intensities 
into the appropriate I registers, replacing the previous intensities stored 
there for the last four intensities in the preceding scan line. 

D.3 SUMMARY 

The encoding and decoding technique presented forms a practical 
and efficient method for the ground-based data compression of multispectral 
data with archival fidelity. The simplicity of the SSDI algorithm and the rapid 
decoding permitted by a Huffman table look-up, form the basis for an algorithm 
which can be rapidly performed. 

The determination of the optimum block size N of code words is an 
important consideration. As N increases, compression Increases, but Table D 
doubles in length each time N increases by one. Parameter N should be that 
value which allows an average data compression of at least 2:1 while permitting 
an array size D that can be accommodated by modest minicomputer systems. 

A value of N = 12 is currently being used. 

Further simulation and optimization of the reconstruction algorithm will 
permit a determination of the minimal set of computations required. A study 
of available minicomputers would permit refining the set of machine instructions 
used, the number of registers allotted for local storage, and the allowable 
storage for Table D. In addition, the time required for reconstruction can 
be estimated for several typical computers. 
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While this discussion has concentrated on the use of the SSDI algorithm 
with Huffman coding based on the statistics of the entire scene, further study 
should be performed on the SSDIA algorithm and adaptive Huffman coding for ground 
reconstruction. These latter techniques permit an increased compression, but 
require a more complex reconstruction algorithm implying a longer reconstruction 
time. In addition, the SSDIA requires the storage of an entire scan line in 
each spectral band. The compression of four MSS tapes into one compressed 
tape provides both economic and archival benefits, assuming the use of efficient 
compression techniques applicable to a wide variety of computers. While further 
investigation and simulation is required, the proposed technique seems to 
represent a viable candidate for the ground compression of MSS digital tapes 
with archival fidelity. 
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